{"cells":[{"cell_type":"markdown","metadata":{"_id":"FDCA025F718B4FC498E8A5BBE44A6E43","id":"B520F324D3AB4EC79974E7B7EFABC4FB","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":[" # <center>Lecture 2: Bayes' Rule  </center>  \n"," \n"," ## <center> Instructor: Dr. Hu Chuan-Peng  </center>"]},{"cell_type":"markdown","metadata":{"_id":"9B23E8F257D24245B035991344E14DE8","id":"04082792AC0E4919A238921A398272C6","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["## Part 1: 【和鲸平台】整合教学+练习"]},{"cell_type":"markdown","metadata":{"_id":"CBD7BE6CEACC42E3AF2D5C273FF36CB9","id":"BEFCC0314A7D4512A71949B9014BB295","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["本学期的贝叶斯课程将通过和鲸平台进行授课与代码练习，请大家提前注册好和鲸平台的账号。  \n","\n","关于和鲸平台的运行环境设置说明如下：  \n","\n","![Image Name](https://cdn.kesci.com/upload/sjmwr1smy0.jpeg?imageView2/0/w/960/h/960)  \n","\n","![Image Name](https://cdn.kesci.com/upload/sjmwr8pj5w.jpeg?imageView2/0/w/960/h/960)  \n","\n"]},{"cell_type":"markdown","metadata":{"_id":"24FAE016A55E44179F8054656732F0B1","id":"C35CF4E9CA30486B9A933D02510DD77A","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["更重要的是：任何问题都可以微信群或者平台里发帖提问。  \n","\n","助教和老师会尽快回复的 🚀。  \n","\n","\n","![Image Name](https://cdn.kesci.com/upload/sjkvf5nlsl.jpeg?imageView2/0/w/960/h/960)  \n","\n","![Image Name](https://cdn.kesci.com/upload/sjkvfd5d65.jpeg?imageView2/0/w/960/h/960)  \n","\n"]},{"cell_type":"markdown","metadata":{"_id":"3B3A1C3022C74C848C83413419C2C33F","id":"7AEA60D19ED649F1987F8443BFED18A2","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["当然，你也可以选择在gitee上进行提问。  \n","点击链接访问gitee：  \n","https://gitee.com/hcp4715/bayesian-analysis-nnupsy  \n","\n","![Image Name](https://cdn.kesci.com/upload/sjkvfsm4c1.jpeg?imageView2/0/w/960/h/960)  \n"]},{"cell_type":"markdown","metadata":{"_id":"959962DB732A4DB59441904EFB17C296","id":"9F87F45FBA584B159AD797C83EC0AFCA","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":[" ## Part 2: 单一事件的贝叶斯模型"]},{"cell_type":"markdown","metadata":{"_id":"97A69A3273D64E7AA2D2CC8C6977E4B9","id":"A5FB09585E8E4156A8760A4184E67F2E","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["## **“让我们从一个每个人都熟悉的事件开始”**  \n","\n","读文献时，大家是否有一个疑问：我看到的这个文章靠谱吗？  \n","\n","![Image Name](https://cdn.kesci.com/upload/sjpbvjqzx2.jpg?imageView2/0/w/640/h/640)  \n"]},{"cell_type":"markdown","metadata":{"_id":"2E3BD9AE42324A669E83F4E27A2D3047","id":"08164BD956FB4ED9AEBDEC8683FC2684","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["2015年，**开放科学合作组织（Open Science Collaboration）** 在《Science》杂志上发表文章，发现只有**36%～47%** 的认知/社会心理学研究成果能被成功重复。  \n","\n","![Image Name](https://cdn.kesci.com/upload/sjos6fmkbs.png?imageView2/0/w/960/h/960)  \n","\n","\n","> Open Science Collaboration ,Estimating the reproducibility of psychological science.Science349,aac4716(2015).DOI:10.1126/science.aac4716"]},{"cell_type":"markdown","metadata":{"_id":"927FC0DD195D47A09AFCAFEECAA90F9E","id":"93BFD96EA4234697B38A9E76E81579CD","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["以这个论文为代表的系列讨论，引发了关于心理学“可重复性危机”的讨论[(胡传鹏等, 2018)](https://journal.psych.ac.cn/xlkxjz/CN/10.3724/SP.J.1042.2016.01504)。  \n","\n","知道这个事实之后，对我们阅读文章时对结果的信念是否产生了影响？  \n","\n","假设我们认同Science这个文章的结论，初步认为大约**40%** 的心理学实验是可重复的。我们以这个数据作为我们对文章的初步“信念”。  \n","\n","新的关于心理学研究可重复性的研究是否会进一步改变我们的信念？"]},{"cell_type":"markdown","metadata":{"_id":"A17CA82F36AF4B1B9D9D426183B62303","id":"FE68156606064582965D4F883A597801","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["2024年，一项针对299项预注册的重复实验数据的研究发现，可重复研究通常以自信、透明和确切的语言撰写，而不可重复的研究则往往表现出模糊性，并使用“边缘型”的说服技巧。  \n","\n","这个新研究可能帮助我们**更新**对心理学科学论文的预测。  \n","\n","假定我们现在从上述299个文章中抽取出一篇论文，我们会如何评估它的可重复性？  \n","\n","![Image Name](https://cdn.kesci.com/upload/sjoly6dhft.png?imageView2/0/w/960/h/960)  \n","\n","\n","> Herzenstein, M., Rosario, S., Oblander, S., & Netzer, O. (2024). The language of (non)replicable social science. Psychological Science, 9567976241254037. https://doi.org/10.1177/09567976241254037  \n"]},{"cell_type":"markdown","metadata":{"_id":"0431DA1DBB18442E8023EB69490F4AB1","id":"6A211583330A425087AFE8E531EB516A","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["基于我们第一个研究（即认为大约40%的研究可以重复）和新证据（可重复性研究通常使用确切语言撰写），我们对随机抽取出的一项研究的可重复性的态度怎样呢？  \n","\n","假如我们仅根据*Science*的研究结果，我们可能会认为这项研究大约有40%可能性被重复出来？但是这意味着2024年新研究的信息没有被用上。  "]},{"cell_type":"markdown","metadata":{"_id":"8276AE623A8941DD95D88D69E9C128F5","id":"671F102509D84E7ABAE180900E513E50","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["根据Herzenstein 等（2024）的研究结果，可重复的研究中，有56%的文章使用确切的语言风格；在不可重复的研究中，使用确切语言的比例为为45%。  \n","\n","根据这些信息，我们可以得到以下几个关键信息：  \n","- 心理学研究可重复的概率为40%  \n","- 心理学研究不可重复的概率为60%  \n","- 可重复的研究中，使用确切语言的概率为56%  \n","- 不可重复的研究中，使用确切语言的概率为45%  \n","\n","\n","![Image Name](https://cdn.kesci.com/upload/sjoxpj2ivv.png?imageView2/0/w/640/h/640)"]},{"cell_type":"markdown","metadata":{"_id":"21789174E0044E24B1CA5DDC803C1D4F","id":"16743F0B8EA64A5BA5A6FBE093EAD1D1","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["根据上述信息，现在我们可以进行以下的简单的运算：  \n","\n","- 研究可重复且使用确切语言的概率 $P = 0.40 \\times 0.56 = 0.224$  \n","- 研究可重复但不使用确切语言的概率 $P= 0.40 * (1-0.56) = 0.176$  \n","- 研究不可重复但使用确切语言的概率 $P= 0.60 * 0.45 = 0.27$  \n","- 研究不可重复且不使用确切语言的概率 $P= 0.60 * (1-0.45) = 0.33$  \n","\n","\n","![Image Name](https://cdn.kesci.com/upload/sjp0bqwjgl.png?imageView2/0/w/960/h/960)  \n","\n"]},{"cell_type":"markdown","metadata":{"_id":"9A966B81FDF3433B90DB37EEA7449D65","id":"3CAA210B428B4726AF564BF99B932FAC","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["假如我们抽取出一个文章使用了确切的语言风格，我们认为它可重复的可能性是多少呢？  \n","\n","$0.224/(0.224 + 0.27) = 0.453$"]},{"cell_type":"markdown","metadata":{"_id":"7B86ECDD3D8C4CF4BC570379B7B8D4E0","id":"F30C9DF8692247DD80290DEE244DF8D5","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["在这个简单的例子当中，我们实际进行了一次“贝叶斯的证据更新”。  \n","\n","接下来我们再来重新审视一下这个事例。  \n","\n","我们选取Herzenstein 等（2024）年的部分真实数据进行探索，包括研究的编号 (title)，文章是否可被重复 (replicated), 文章结果描述的确切性 (certain)和文章表述的积极性 (posemo)。"]},{"cell_type":"code","execution_count":1,"metadata":{"_id":"6650178E299B46A59FCD49B10A2F19CA","collapsed":false,"id":"C4211626E59A44BCB2085B9505C99F01","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[],"source":["# 非选课的同学，可以使用和鲸社区的Python镜像，运行以下的代码安装必要的模块，需要3-5分钟的时间完成加载\n","# 后续会有专门的社区公开镜像，给大家提前配置好运行环境\n","# 将下列代码行解除注释，删除“#”，运行即可：\n","# !conda install -y graphviz bambi=0.13.0 pymc=5.16.2 PreliZ=0.9.0 ipympl=0.9.4 pingouin=0.5.4\n","\n","# docker部署和使用教程链接：https://zhuanlan.zhihu.com/p/719739087\n","# docker pull hcp4715/pybaysian:latest\n","# docker run -it --rm -p 8888:8888 hcp4715/pybaysian:latest"]},{"cell_type":"code","execution_count":18,"metadata":{"_id":"5E2EC868AAD6488CA5E5DFADCF965A1A","collapsed":false,"id":"76867943A901493595AFDC8873B9561A","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[],"source":["# 导入数据加载和处理包：pandas\n","import pandas as pd\n","# 导入数字和向量处理包：numpy\n","import numpy as np\n","# 导入基本绘图工具：matplotlib\n","import matplotlib.pyplot as plt\n","\n","# 使用 pandas 导入示例数据\n","try:\n","  df = pd.read_csv(\"/home/mw/input/bayes3797/replicated_language_cleaned.csv\") \n","except:\n","  df= pd.read_csv('data/replicated_language_cleaned.csv')\n","\n","df = df.drop('study_name', axis=1)\n","df.head()\n","\n","# 设置APA 7的画图样式\n","plt.rcParams.update({\n","    'figure.figsize': (4, 3),      # 设置画布大小\n","    'font.size': 12,               # 设置字体大小\n","    'axes.titlesize': 12,          # 标题字体大小\n","    'axes.labelsize': 12,          # 轴标签字体大小\n","    'xtick.labelsize': 12,         # x轴刻度字体大小\n","    'ytick.labelsize': 12,         # y轴刻度字体大小\n","    'lines.linewidth': 1,          # 线宽\n","    'axes.linewidth': 1,           # 轴线宽度\n","    'axes.edgecolor': 'black',     # 设置轴线颜色为黑色\n","    'axes.facecolor': 'white',     # 轴背景颜色（白色）\n","    'xtick.direction': 'in',       # x轴刻度线向内\n","    'ytick.direction': 'out',      # y轴刻度线向内和向外\n","    'xtick.major.size': 6,         # x轴主刻度线长度\n","    'ytick.major.size': 6,         # y轴主刻度线长度\n","    'xtick.minor.size': 4,         # x轴次刻度线长度（如果启用次刻度线）\n","    'ytick.minor.size': 4,         # y轴次刻度线长度（如果启用次刻度线）\n","    'xtick.major.width': 1,        # x轴主刻度线宽度\n","    'ytick.major.width': 1,        # y轴主刻度线宽度\n","    'xtick.minor.width': 0.5,      # x轴次刻度线宽度（如果启用次刻度线）\n","    'ytick.minor.width': 0.5,      # y轴次刻度线宽度（如果启用次刻度线）\n","    'ytick.labelleft': True,       # y轴标签左侧显示\n","    'ytick.labelright': False      # 禁用y轴标签右侧显示\n","})"]},{"cell_type":"code","execution_count":3,"metadata":{"_id":"7220F66CE63E4264AD38AC5B6A7C50A3","collapsed":false,"id":"9F39DBCF0C654E2C9518A6F1DBED9F5D","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[],"source":["df.head()"]},{"cell_type":"markdown","metadata":{"_id":"D1AB055B5890484881A9BA656B67D054","id":"E55F6A87BA9D426495965ED34E9059CF","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["### 先验 (prior) 和 数据 (data)  \n","\n","重新回顾我们要评估的事件：我们认为从299项心理学研究中随机选出来的一项研究的可重复性如何？  \n","\n","在评估这个事件之前，我们知道Science于2015年发表了一个大规模重复实验，发现40%的心理学研究是可以被重复出来。  \n","\n","对于这299项研究，它们有不同的语言风格："]},{"cell_type":"code","execution_count":4,"metadata":{"_id":"69527C9D804D4A6B931487113C2E71BF","collapsed":false,"id":"6CAFB9DF235941A9B9D2CFF5786417ED","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[],"source":["# 数据预处理\n","# 计算 'certain' 列的中位数\n","median_certain = df['certain'].median()\n","\n","# 创建新列，编码规则：大于中位数为 1，小于等于中位数为 2\n","df['language_style'] = df['certain'].apply(lambda x: 1 if x > median_certain else 0)\n","\n","# 输出结果\n","df.head()"]},{"cell_type":"markdown","metadata":{"_id":"7D0BAB067DCB4951B131492739ACE9A6","id":"4CDFF11C64264DDB8FD8F372F508681B","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["研究能否被重复出来，与他们的语言风格有关系：  \n","\n","有 56%（71/126）的能被重复的研究使用了确切的语言风格；  \n","\n","约45%（78/173）的不能被重复研究使用了确切的语言风格。"]},{"cell_type":"code","execution_count":5,"metadata":{"_id":"EE563704B3464650B00D8CDEB623D3E9","collapsed":false,"id":"535BB55719C14AD2A3F5060CFBCED79F","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[],"source":["# 计算不同水平的数量和百分比\n","level_counts = df['replicated'].value_counts()\n","level_percentages = df['replicated'].value_counts(normalize=True) * 100\n","\n","# 百分比保留两位小数\n","level_percentages = level_percentages.round(2)\n","\n","# 创建一个新的 DataFrame 合并结果\n","result_df1 = pd.DataFrame({'数量': level_counts, '百分比': level_percentages})\n","# 展示结果(0代表不可重复，1代表可重复)\n","result_df1"]},{"cell_type":"code","execution_count":6,"metadata":{"_id":"3216FB04D2094FABBE7FBDE352C58000","collapsed":false,"id":"51A7DAF57C704FF386D57E0C7F5D04A0","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[],"source":["# 计算不同水平的数量\n","result_df2 = df.groupby(['replicated', 'language_style']).size().unstack()\n","# 结果\n","result_df2"]},{"cell_type":"markdown","metadata":{"_id":"D7B1CA15338E40EC923C2CD12E25E9C3","id":"8ACE15B0E4214BF8A86D726BE5BC0F1A","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["### 先验 (prior) 和 数据 (data)  \n","\n","#### 先验  \n","\n","在我们这个事件中，我们评估某项研究可重复性之前，我们关于研究可重复性的信念，在贝叶斯统计中被称为先验（prior）。  \n","\n","假设我们的信念被2015年Science的文章所影响，相信约40%的心理学实验是可重复的。这就是我们开始了解这项研究前的信念。  \n","\n","- 先验（prior）：指没有观察到具体数据之前，根据已有知识、经验或主观判断对某个事件发生概率的初步估计。  \n","\n","本例中，40%的估计代表了我们基于已有文献和领域经验的先验信念——即在没有观察具体文章之前，推测它有40%的研究能够成功重复。  \n"]},{"cell_type":"markdown","metadata":{"_id":"3AFE8BA367D04C728F6936BFE83C68D8","id":"8510F6F814694D158D13B3A08811F75E","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["#### 先验 vs 数据  \n","\n","在了解到被评估的研究来自Herzenstein 等（2024）之后，我们又获得了新的信息，这个新信息我们将其称为数据（data）。  \n","\n","此时，我们会有两个信息：  \n","\n","- 先验信息 (prior)：约 40% 的研究是可重复。  \n","- 数据 (data) ：有 56%能被重复的研究使用了确切的语言风格；约45%不能被重复研究使用了确切的语言风格。  \n","\n","我们会如何推断？"]},{"cell_type":"markdown","metadata":{"_id":"B32FEE0C2D57457A83455C875FB2E8E8","id":"38E45E9401034D209F2599EA4B0BD999","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["在先验和数据之间找到平衡？  \n","\n","这正是贝叶斯的思路：基于数据对先验进行更新。  \n","\n","$$  \n","Posterior = \\frac {data * \\, prior}{Average \\, probability \\, of \\, data}  \n","$$  \n","\n","![Image Name](https://cdn.kesci.com/upload/sjkvl0f6gx.png?imageView2/0/w/960/h/960)  "]},{"cell_type":"markdown","metadata":{"_id":"3CC4CAB875C945868176443036FE5CEF","id":"1981F998E95F4132B166ED49ECA94AA0","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["#### 先验概率(Prior probability)  \n","我们现在使用更加正式一点的语言来对上述的信息进行描述：  \n","\n","假如一项心理学研究**能**被其他研究者独立地重复出来，我们认为一个特定的事件发生了。  \n","\n","我们将这个事件使用$B$来表示。  \n","\n","假如一项心理学研究**不能**被其他研究者独立地重复出来，我们认为一个特定的事件**没有**发生了，使用符号$B^{c}$(B的补集complement)。  \n","\n","根据Science于2015年的文章，我们可以得以下公式：  \n","\n","$$  \n","P(B) = 0.40 \\\\  \n","\n","P(B^{c}) = 0.60  \n","$$  "]},{"cell_type":"markdown","metadata":{"_id":"84A1E61DCF1B49E3961E5D2DE1354729","id":"BDF24AD9A3F44DC1B51F75FED304E021","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["<style>  \n","table {  \n","    width: 100%;  \n","    table-layout: auto;  \n","}  \n","</style>  \n","\n","| 事件       | **$B$**   | **$B^{c}$** | **Total** |  \n","|------------|-----------|-------------|-----------|  \n","| **probability** | **0.4** | **0.6**     | **1**     |  \n","\n","\n","换一句话说，在我们对需要被评估的研究进行评估前，我们关于事件$B$的先验信念是$P(B)$，这也被称为先验模型(prior model)  \n","\n","作为一个有效的概率模型(valid probability model)，它必须：  \n","\n","（1）考虑所有可能的事件（所有文章都必须是可重复或不可重复的，没有其他可能性）；  \n","\n","（2）它为每个事件分配先验概率；  \n","\n","（3）这些概率加起来为1。"]},{"cell_type":"markdown","metadata":{"_id":"194A3554CB3042419CEE8B257085C841","id":"15789B51556D43DEBD61C978D1E91321","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["### 数据模型（条件概率与似然性）  \n","\n","借鉴先验模型的构建方式，我们同样可以采用模型（即公式）对关于目标研究的新信息进行正式地描述。  \n","- 我们用符号 $A$ 表示研究中使用了确切的语言风格。  \n","\n","我们要将如下一句话的信息进行形式化：  \n","\n","**有 56%能被重复的研究使用了确切的语言风格；约45%不能被重复研究使用了确切的语言风格。**  \n","\n","**有 56%能被重复的研究使用了确切的语言风格，44%的能被重复的研究没有使用确切的语言风格；约45%不能被重复研究使用了确切的语言风格；55%的不能被重复的研究没有使用确切的语言风格。**  \n","\n","将数据形式化，通过条件概率来量化文章展现出语言确切的可能性。具体如下：  \n","\n","$$  \n","P(A|B) \\approx 56\\%  \n","$$  \n","- 当研究是可重复的，使用确切语言的概率大约 56%。  \n","\n","$$  \n","P(A|B^{c}) \\approx 45\\%  \n","$$  \n","- 在研究不可重复的情况下，使用确切语言的概率大约为 45%。  \n"]},{"cell_type":"markdown","metadata":{"_id":"7F15AB569AC94A4FB3ECAE20E73A67A9","id":"93798EDC6A5F4C94AA9DB9BAF7932CDB","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["\n","\n","![Image Name](https://cdn.kesci.com/upload/sjq96jfv8f.png?imageView2/0/w/960/h/960)  \n"]},{"cell_type":"markdown","metadata":{"_id":"2B6980037F1B43238F991D9DF01B5581","id":"436C0CF837D34DF6AE5DD4FAED577764","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["#### 条件概率  \n","\n","条件概率：给定某个条件下发生另一件事情的概率。  \n","注意：条件概率的定义是有顺序的，$P(A|B)$ 与 $P(B|A)$ 并不相等。  \n","\n","例如：  \n","- $P(A|B)$ 表示在研究可重复的情况下，使用确切语言的概率；  \n","- $P(B|A)$ 表示的是在研究使用确切语言的情况下，该研究是可重复的概率。  \n","\n","很多时候，人们容易混淆这两者，尤其在贝叶斯推理中。因此，清楚条件的前提和结果是很重要的。"]},{"cell_type":"markdown","metadata":{"_id":"FF041A9D25BC4D26B5B16147482E99AF","id":"5A1F673FBB2C4FC28F962907546A762E","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["#### 似然(likelihood)  \n","\n","**似然的定义**  \n","\n","从条件概率中，我们知道 $P(A|B) = 0.56$ 和 $P(A|B^c) = 0.45$，即使用确切语言的研究更可能是可重复的。  \n","\n","似然（likelihood）描述的是在不同假设下，某个数据模式出现的可能性。在这个例子中，我们比较两种假设:  \n","- $P(A|B) = 0.56$：在可重复研究的假设下，使用确切语言的概率较高。  \n","- $P(A|B^c) = 0.45$：在不可重复研究的假设下，使用确切语言的概率较低。  \n","\n","因此，似然函数表明：当前数据模式（使用确切语言）在可重复的假设下更可能出现：  \n","$P(A|B) = 0.56 > P(A|B^{c}) = 0.45$  \n","\n","这就是似然函数(likelihood function)的核心：反映了在不同的假设（可重复或不可重复）下，某个数据$A$出现的可能性。  \n"]},{"cell_type":"markdown","metadata":{"_id":"EBF5574E5D514D099F1E542A25C73CB2","id":"DFEE23DB43F246D59042B5E3A9D56A9C","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["例如，针对“数据 A：研究使用确切语言”的似然可以写成：$L(*|A)$  \n","$$  \n","L(B|A) = P(A|B) \\quad\\quad L(B^{c}|A) = P(A|B^{c})  \n","$$  \n","\n","上述两个式子分别表示在“研究可重复”和“研究不可重复”两种可能的情况下，使用确切语言的概率。  \n","\n","*注意，在似然函数中，数据是已知发生的，而假设是可能发生的"]},{"cell_type":"markdown","metadata":{"_id":"522E8487320A4C03B55370205BC70957","id":"B9E6EF35A0324F53B9DE7E4797DEDA93","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["#### 概率(Probability) vs 似然(likelihood)  \n","\n","🤔概率和似然似乎都在表示某种可能性，它们的区别是什么呢？  \n","  \n","| 特性        | 概率 (Probability)                                      | 似然 (Likelihood)                                   |  \n","|-------------|---------------------------------------------------------|----------------------------------------------------|  \n","| 定义        | 已知假设条件，得到某个数据的可能性                           | 已知数据，不同假设条件下得到该数据的可能性     |  \n","| 范围        | [0, 1]                                                 | 不限于 [0, 1]                                     |  \n","| 总和        | 所有可能事件的总和为1                                  | 可以不等于1                                       |  \n","| 应用        | 预测和决策                                           | 模型估计和选择                                     |  \n","\n","注意：  \n","* 先验概率的总和等于1，因为先验表示所有可能结果的分布，表示事件B发生的概率，是我们的主观推测；  \n","* 似然总和不等于1，因为似然函数不是概率函数，它告诉我们事件A在不同假设下发生的相对可能性。"]},{"cell_type":"markdown","metadata":{"_id":"9C6D03A81823448BAF841D0A31BB7935","id":"F1E69183F84A4197AA88A3F18809179F","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["根据我们的例子，概率和似然可以整理为下表：  \n","\n","TABLE 2.2: Prior probabilities and likelihoods of reproducible research.  \n","\n","| event       |     $B$     |     $B^c$   |   total   |  \n","|-------------|--------------|--------------|-----------|  \n","| prior       |      0.4    |      0.6     |     1     |  \n","| likelihood   |     0.56    |     0.45     |   ≠ 1     |"]},{"cell_type":"markdown","metadata":{"_id":"5D88441C0EE945FCA95BD99A5040421D","id":"0A6FB7E2230B48219DC90F9F19EDD85E","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["## 分母（normalizing constant）-- 边际概率 (marginal probability)  \n","\n","似然函数描述了在可重复性研究和不可重复研究中使用确切语言的情况。  \n","\n","我们想知道的是：所有研究中使用自信语言的总体可能性是多少。  \n","\n","这被称为边际概率 $P(A)$"]},{"cell_type":"markdown","metadata":{"_id":"BF86779031B04107B1939DF117337534","id":"C56D47A1748549D7B2BE5F7996176749","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["我们要做的，就是把每个假设下出现事件$A$的似然与每个假设本身的概率相乘（即把每个假设自身的概念纳入考虑），这两者之和即为边际概率。  \n","$$  \n"," P(A) = P(A \\cap B) + P(A \\cap B^{c}) = L(B|A) * P(B) + L(B^{c} | A) * P(B^{c})  \n","$$  \n","\n","$$ P(A) = 0.56 * 0.4 + 0.45* 0.6 = 0.494 $$  \n"]},{"cell_type":"markdown","metadata":{"_id":"AE1A881A0B5049E188536745016CA9B7","id":"FAF2CB763E4E45E78C2ED502DDF9A45F","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["### 后验概率模型(Posterior probability model via Bayes’ Rule)"]},{"cell_type":"markdown","metadata":{"_id":"215AEDEE85144A8B8EF61191D2C0D7BC","id":"E4EF151ECB8441E1885A53FA47378F1A","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["**直觉理解**  \n","\n","最后，我们来计算后事件$B$的后验概率，即，当我们知道某个研究使用了确切的语言风格之后，它能被重复的可能性是多少？  \n","\n","我们同样通过条件概率来描述它：$P(B|A)$。  \n","\n","在正式计算之前，我们可以回顾一下这个表格来建立一些直觉。   \n","\n","||$B$|$B^c$|Total|  \n","|---|---|---|---|  \n","|$A$|0.56 * 0.4 = 0.224|0.45* 0.6 = 0.27|0.494|  \n","|$A^c$|0.176|0.33|0.506|  \n","|Total|0.4|0.6|1.0|  \n","\n","note：  \n","- $A$ ：表示使用确切语言的研究。  \n","- $A^c$ ：表示不使用确切语言的研究。  \n","- $B$ ：表示研究是可重复的。  \n","- $B^c$ ：表示研究不可重复的。  \n","\n","因为我们知道这项研究**使用确切语言风格**，所以我们直接锁定第一行，  \n","- 在A行中，45.3%(0.224/0.494)的研究是可重复的，54.7%(0.27/0.494)的研究是不可重复的。  \n","- 因此，根据后验概率 45.3%的可能性可以认为当前这一研究是可重复的。  \n"]},{"cell_type":"markdown","metadata":{"_id":"2F6FEE1196FF413D9DC75E49DE27510D","id":"9A1A48932DF54ECAA64204025CBA9302","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["**正式计算**  \n","\n","如何凭借贝叶斯公式的数学形式推导得到该结果？  \n","\n","$$  \n","Posterior \\sim P(B|A) = \\frac {data * prior}{Average \\, probability \\, of \\, data} ={\\frac{P(A\\cap B)}{P(A)}}={\\frac{L(B|A) * P(B)}{L(B|A) * P(B) + L(B^{c}|A) * P(B^{c})}}  \n","$$  \n","\n","- $P(B|A)={\\frac{P(B)L(B|A)}{P(A)}}={\\frac{0.4\\cdot0.56}{0.494}}=0.453$  \n","- 当带入之前计算得到的数值到贝叶斯公式中，我们得到了确切语言为可重复研究的概率。  \n","\n"]},{"cell_type":"markdown","metadata":{"id":"F25DE9503E93491AB8ECBCFEE6FCEA14","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["使用同样的方法，我们可以计算出使用确切语言的研究为不可重复研究的概率，结果如下表。  \n","- 可以注意到：先验概率和后验概率的和均等于1。  \n","\n","TABLE 2.4: The prior and posterior models of reproducibility.  \n","\n","\n","| event    | $B$     | $B^c$ | Total    |  \n","| --------  | -------- | -------- | -------- |  \n","| prior probability | 0.4 | 0.6 | 1 |  \n","| posterior probability | 0.453 | 0.547 | 1 |  \n"]},{"cell_type":"markdown","metadata":{"_id":"3845FF266B2F4EC5A5A0B278B058A20D","id":"2064DF0BEEC6442790E5BFB823C9B185","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["思考时间🧐：是否加入分母的意义何在？"]},{"cell_type":"markdown","metadata":{"_id":"CEB912EE1CC941F6981238ACA99290B0","id":"5DBA1A513970453A974AF997B3D97252","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["#### 后验概率计算模拟练习  \n","\n","\n","🤓为了深入理解先验知识、似然（数据）和后验概率，我们将通过编写代码来计算后验概率，以增强对这些概念的理解和实践能力。"]},{"cell_type":"markdown","metadata":{"_id":"014643A57C28407E8B3FFABE2013D299","id":"556123429A584F7BA6F0C60C0D84EB10","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["1. 定义研究的可重复性与相应的先验概率"]},{"cell_type":"code","execution_count":20,"metadata":{"_id":"73947288991047B59848C3816D23BCFA","collapsed":false,"id":"9C7CFB4D2A7F4DC8A4C4E1BEB7368BBF","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[],"source":["# 定义文章类型\n","article = pd.DataFrame({'replicated': ['yes', 'no']})\n","\n","# 定义先验概率\n","prior = [0.4, 0.6]"]},{"cell_type":"markdown","metadata":{"_id":"778C6170F14C43CC947B8E9A3A624813","id":"0C8CF38F2D764996891960116B2696F0","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["2. 模拟一些可能被投放给你的研究"]},{"cell_type":"code","execution_count":21,"metadata":{"_id":"5FE6653043FA4C9AB98D071433A517EC","collapsed":false,"id":"E38F6CBE358E439D97CF711B263487B3","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[{"data":{"text/html":["<div>\n","<style scoped>\n","    .dataframe tbody tr th:only-of-type {\n","        vertical-align: middle;\n","    }\n","\n","    .dataframe tbody tr th {\n","        vertical-align: top;\n","    }\n","\n","    .dataframe thead th {\n","        text-align: right;\n","    }\n","</style>\n","<table border=\"1\" class=\"dataframe\">\n","  <thead>\n","    <tr style=\"text-align: right;\">\n","      <th></th>\n","      <th>replicated</th>\n","    </tr>\n","  </thead>\n","  <tbody>\n","    <tr>\n","      <th>0</th>\n","      <td>yes</td>\n","    </tr>\n","    <tr>\n","      <th>1</th>\n","      <td>no</td>\n","    </tr>\n","    <tr>\n","      <th>1</th>\n","      <td>no</td>\n","    </tr>\n","    <tr>\n","      <th>1</th>\n","      <td>no</td>\n","    </tr>\n","    <tr>\n","      <th>1</th>\n","      <td>no</td>\n","    </tr>\n","    <tr>\n","      <th>1</th>\n","      <td>no</td>\n","    </tr>\n","    <tr>\n","      <th>1</th>\n","      <td>no</td>\n","    </tr>\n","    <tr>\n","      <th>1</th>\n","      <td>no</td>\n","    </tr>\n","    <tr>\n","      <th>1</th>\n","      <td>no</td>\n","    </tr>\n","    <tr>\n","      <th>1</th>\n","      <td>no</td>\n","    </tr>\n","  </tbody>\n","</table>\n","</div>"],"text/plain":["  replicated\n","0        yes\n","1         no\n","1         no\n","1         no\n","1         no\n","1         no\n","1         no\n","1         no\n","1         no\n","1         no"]},"execution_count":21,"metadata":{},"output_type":"execute_result"}],"source":["# 模拟生成 10000 项研究，包括其类型\n","np.random.seed(84735)\n","article_sim = article.sample(n=10000, weights=prior, replace=True)\n","# 查看前 10 行数据\n","article_sim.head(10)"]},{"cell_type":"code","execution_count":22,"metadata":{"_id":"0798D80080374E6DB278A4BD30AFD568","collapsed":false,"id":"96FE1159A5554A3CB19FB748105E0E37","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[{"data":{"image/png":"iVBORw0KGgoAAAANSUhEUgAAAYEAAAEjCAYAAADUjb3BAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAAAf50lEQVR4nO3df1SUVf4H8PcYOMOPFGZGYKBTGphIgj/IgDUCxBQLTFGikjpglqGrYplCIjIbcKi1XV12SNtCLNIyCzVTOEWgtpmi9ANc0NQ1ClEBtVSYBH2+f/jlWScGYvAHwn2/zuEk936eO/dyaN48P+Z5FJIkSSAiIiH16e4JEBFR92EIEBEJjCFARCQwhgARkcAYAkREAmMIEBEJjCFARCQwhgARkcCsunsCt4L6+noUFhZi4MCBsLGx6e7pEBFds6amJhw7dgwTJkyAVqttt44hAKCwsBAxMTHdPQ0iousuLy8P06dPb7efIQBg4MCBAK78sIYOHdq9kyEiug4qKysRExMjv7+1hyEAyIeAhg4dilGjRnXzbIiIrp8/OsTNE8NERALrUgh8+eWXePjhh+Ho6AgbGxsMHjwYr7zyiklNWVkZxo0bB3t7ezg4OCAyMhJHjx41O15WVhY8PT2hVCoxaNAg6PV6NDc3t6k7deoUYmNjodVqYWtri4CAABQVFXVlCUREhC6EwLp16xAUFIT+/fvjnXfewbZt27B48WJcfUfqqqoqBAcH4+LFi9iwYQNycnJw6NAhBAYGoq6uzmS89PR0zJ8/H5GRkSgsLMTs2bORkZGBOXPmmNT99ttvCA0NRVFREVauXInNmzfD2dkZYWFh2LFjRxeXT0QkOMkCP//8s2RnZyfFx8d3WBcVFSVptVrpl19+kduOHTsmWVtbS4sWLZLb6uvrJZVKJT333HMm26enp0sKhUI6cOCA3GYwGCQA0ldffSW3NTc3S15eXtL9999vyTLa2L9/vwRA2r9//zWNQ0R0q+js+5pFewJvvfUWLly4gMWLF7db09LSgq1bt2Lq1Kno16+f3H7XXXchJCQE+fn5cltBQQGMRiPi4uJMxoiLi4MkSdi0aZPclp+fjyFDhiAgIEBus7KyQkxMDPbu3YuamhpLlkJERLDwcNDOnTuhVqtRVVWFESNGwMrKCk5OTnj++efx66+/AgCOHDmCpqYm+Pj4tNnex8cHhw8fhtFoBABUVFQAALy9vU3qdDodtFqt3N9a296YAHDgwAFLlkJERLDwEtGamho0NjYiKioKSUlJWLFiBUpLS7Fs2TJUVFRg165daGhoAACo1eo226vVakiShDNnzkCn06GhoQFKpRJ2dnZma1vHAoCGhoZ2x2zt70htbS1qa2vN9lVWVna4LRFRb2VRCFy+fBlGoxHLli1DYmIiACA4OBh9+/ZFQkICioqKYGtrCwBQKBTtjnN1X2frLK39vdWrV0Ov13dYQ0QkGotCQKPR4IcffsCECRNM2idOnIiEhASUlZXh0UcfBWD+L/PTp09DoVDAwcFBHs9oNKKxsVEOj6trfX19TV67vTEB83seV5s1axYmTZpktq/1k3W3ooGJn3b3FHqkY5mPdPcUiHoEi0LAx8cHX3/9dZt26f8vD+3Tpw/c3d1hY2OD8vLyNnXl5eXw8PCASqUC8L9zAeXl5fDz85PrTpw4gfr6egwbNkxu8/b2bndMACa15uh0Ouh0uj9aIhGRUCw6MTx16lQAwPbt203at23bBgDw9/eHlZUVIiIi8PHHH+PcuXNyTXV1NYqLixEZGSm3hYWFQaVSITc312S83NxcKBQKTJ48WW6bMmUKqqqqsGfPHrmtpaUFeXl58PPzg6urqyVLISIiWLgnMH78eEREROAvf/kLLl++DH9/f+zbtw96vR7h4eF44IEHAAB6vR6jR49GeHg4EhMTYTQakZKSAq1WixdffFEeT61WIzk5GUuXLoVarcb48eNRWlqK1NRUzJw5E15eXnLtjBkzYDAYEBUVhczMTDg5OSE7OxsHDx7E559/fp1+HEREYrH4E8MffPABEhIS8Oabb2LixIl44403sGDBAmzcuFGu8fT0RElJCaytrTFt2jTExsbCw8MDO3fuxIABA0zGW7JkCVasWIGNGzdi/PjxyMrKQmJiIgwGg0mdUqlEUVERQkJCMHfuXERERKC2thbbt29HUFBQF5dPRCQ2hSRddb8HQZWVlcHX1xf79++/5e4iyhPDXcMTwyS6zr6v8S6iREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCu+YQeOutt6BQKGBvb9+mr6ysDOPGjYO9vT0cHBwQGRmJo0ePmh0nKysLnp6eUCqVGDRoEPR6PZqbm9vUnTp1CrGxsdBqtbC1tUVAQACKioqudRlEREK6phCoqanBwoUL4erq2qavqqoKwcHBuHjxIjZs2ICcnBwcOnQIgYGBqKurM6lNT0/H/PnzERkZicLCQsyePRsZGRmYM2eOSd1vv/2G0NBQFBUVYeXKldi8eTOcnZ0RFhaGHTt2XMtSiIiEZHUtGz///PN48MEHoVarsXHjRpO+lJQUKJVKbN26Ff369QMA+Pr6YvDgwVi+fDleffVVAEBDQwPS0tLw7LPPIiMjAwAQHByM5uZmJCcnIyEhAV5eXgCAt99+GxUVFfjqq68QEBAAAAgJCcHw4cOxaNEi7Nmz51qWQ0QknC7vCeTl5WHHjh3Izs5u09fS0oKtW7di6tSpcgAAwF133YWQkBDk5+fLbQUFBTAajYiLizMZIy4uDpIkYdOmTXJbfn4+hgwZIgcAAFhZWSEmJgZ79+5FTU1NV5dDRCSkLu0JnDp1CgkJCcjMzMQdd9zRpv/IkSNoamqCj49Pmz4fHx989tlnMBqNUKlUqKioAAB4e3ub1Ol0Omi1WrkfACoqKhAYGGh2TAA4cOAA3NzczM65trYWtbW1ZvsqKyvbWSmROAYmftrdU+iRjmU+0t1TuCZdCoHZs2djyJAhiI+PN9vf0NAAAFCr1W361Go1JEnCmTNnoNPp0NDQAKVSCTs7O7O1rWO1jtvemFe/rjmrV6+GXq/veGFERIKxOAQ++ugjfPLJJ/jmm2+gUCg6rO2o/+q+ztZZWnu1WbNmYdKkSWb7KisrERMT0+62RES9lUUhcP78ecyZMwdz586Fq6srzp49CwC4ePEiAODs2bOwtraGRqMBYP4v89OnT0OhUMDBwQEAoNFoYDQa0djYCFtb2za1vr6+8vcajabdMQHzex6tdDoddDpd5xdLRCQAi04M19fX4+TJk3j99dfh6Ogof61fvx4XLlyAo6Mjpk+fDnd3d9jY2KC8vLzNGOXl5fDw8IBKpQLwv3MBv689ceIE6uvrMWzYMLnN29u73TEBmNQSEdEfsygEXFxcUFxc3OZrwoQJUKlUKC4uRlpaGqysrBAREYGPP/4Y586dk7evrq5GcXExIiMj5bawsDCoVCrk5uaavFZubi4UCgUmT54st02ZMgVVVVUml4K2tLQgLy8Pfn5+Zj+vQERE7bPocJBKpUJwcHCb9tzcXNx2220mfXq9HqNHj0Z4eDgSExNhNBqRkpICrVaLF198Ua5Tq9VITk7G0qVLoVarMX78eJSWliI1NRUzZ86UPyMAADNmzIDBYEBUVBQyMzPh5OSE7OxsHDx4EJ9//rnlqyciEtwNu3eQp6cnSkpKYG1tjWnTpiE2NhYeHh7YuXMnBgwYYFK7ZMkSrFixAhs3bsT48eORlZWFxMREGAwGkzqlUomioiKEhIRg7ty5iIiIQG1tLbZv346goKAbtRQiol7rmj4x3Co3N7fN4RzgyieEO/sX+rx58zBv3rw/rHN2dsbatWstnSIREZnBu4gSEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCYwgQEQmMIUBEJDCGABGRwBgCREQCsygEvvjiC8yYMQOenp6ws7ODm5sbHn30Uezfv79NbVlZGcaNGwd7e3s4ODggMjISR48eNTtuVlYWPD09oVQqMWjQIOj1ejQ3N7epO3XqFGJjY6HVamFra4uAgAAUFRVZsgQiIrqKRSHwxhtv4NixY5g/fz62bduGlStX4tSpU/D398cXX3wh11VVVSE4OBgXL17Ehg0bkJOTg0OHDiEwMBB1dXUmY6anp2P+/PmIjIxEYWEhZs+ejYyMDMyZM8ek7rfffkNoaCiKioqwcuVKbN68Gc7OzggLC8OOHTuu4UdARCQuK0uKDQYDnJycTNrCwsLg4eGBjIwMjB07FgCQkpICpVKJrVu3ol+/fgAAX19fDB48GMuXL8err74KAGhoaEBaWhqeffZZZGRkAACCg4PR3NyM5ORkJCQkwMvLCwDw9ttvo6KiAl999RUCAgIAACEhIRg+fDgWLVqEPXv2XMOPgYhITBbtCfw+AADA3t4eXl5e+OmnnwAALS0t2Lp1K6ZOnSoHAADcddddCAkJQX5+vtxWUFAAo9GIuLg4kzHj4uIgSRI2bdokt+Xn52PIkCFyAACAlZUVYmJisHfvXtTU1FiyFCIiwnU4MfzLL7+grKwM9957LwDgyJEjaGpqgo+PT5taHx8fHD58GEajEQBQUVEBAPD29jap0+l00Gq1cn9rbXtjAsCBAweudSlERMKx6HCQOXPmzMGFCxewZMkSAFcO8QCAWq1uU6tWqyFJEs6cOQOdToeGhgYolUrY2dmZrW0dq3Xc9sa8+nXbU1tbi9raWrN9lZWVHW5LRNRbXVMILF26FO+99x6ysrLg6+tr0qdQKNrd7uq+ztZZWvt7q1evhl6v77CGiEg0XQ4BvV6PtLQ0pKen489//rPcrtFoAJj/y/z06dNQKBRwcHCQa41GIxobG2Fra9um9upg0Wg07Y4JmN/zuNqsWbMwadIks32VlZWIiYnpcHsiot6oSyGg1+uRmpqK1NRUvPzyyyZ97u7usLGxQXl5eZvtysvL4eHhAZVKBeB/5wLKy8vh5+cn1504cQL19fUYNmyY3Obt7d3umABMas3R6XTQ6XSdXCERkRgsPjH8yiuvIDU1FcnJyVi2bFmbfisrK0RERODjjz/GuXPn5Pbq6moUFxcjMjJSbgsLC4NKpUJubq7JGLm5uVAoFJg8ebLcNmXKFFRVVZlcCtrS0oK8vDz4+fnB1dXV0qUQEQnPoj2B119/HSkpKQgLC8MjjzyCr7/+2qTf398fwJU9hdGjRyM8PByJiYkwGo1ISUmBVqvFiy++KNer1WokJydj6dKlUKvVGD9+PEpLS5GamoqZM2fKnxEAgBkzZsBgMCAqKgqZmZlwcnJCdnY2Dh48iM8///xafgZERMKyKAQ++eQTAFeu7y8oKGjTL0kSAMDT0xMlJSVYvHgxpk2bBisrK4wdOxbLly/HgAEDTLZZsmQJbr/9dhgMBixfvhwuLi5ITEyUrzZqpVQqUVRUhEWLFmHu3LlobGzEiBEjsH37dgQFBVm0aCIiusKiECgpKel0ra+vb6f/Qp83bx7mzZv3h3XOzs5Yu3Ztp+dAREQd411EiYgExhAgIhIYQ4CISGAMASIigTEEiIgExhAgIhIYQ4CISGAMASIigTEEiIgExhAgIhIYQ4CISGAMASIigTEEiIgExhAgIhIYQ4CISGAMASIigTEEiIgExhAgIhIYQ4CISGAMASIigTEEiIgExhAgIhIYQ4CISGAMASIigTEEiIgExhAgIhIYQ4CISGAMASIigTEEiIgExhAgIhIYQ4CISGAMASIigTEEiIgExhAgIhIYQ4CISGAMASIigTEEiIgExhAgIhIYQ4CISGAMASIigTEEiIgExhAgIhIYQ4CISGAMASIigTEEiIgExhAgIhIYQ4CISGAMASIigTEEiIgExhAgIhIYQ4CISGAMASIigTEEiIgExhAgIhIYQ4CISGAMASIigTEEiIgExhAgIhIYQ4CISGAMAeqSlvOncfbL99By/nR3T4V6Of6u3VgMAeqSS+dP45d/r8cl/o9JNxh/126sHhcC58+fR0JCAlxdXaFSqTBixAi8//773T0tIqIeyaq7J2CpyMhIlJaWIjMzE/fccw/WrVuHJ554ApcvX8aTTz7Z3dMjIupRelQIbNu2DZ999pn8xg8AISEh+PHHH/HSSy8hOjoat912WzfPkoio5+hRh4Py8/Nhb2+PqKgok/a4uDgcP34ce/bs6aaZERH1TD0qBCoqKjB06FBYWZnuwPj4+Mj9RETUeT3qcFBDQwPuvvvuNu1qtVrub09tbS1qa2vN9n377bcAgMrKymuf5HX224nD3T0Fs5obfjL5762mrKysu6fQ4/B3rWtu1d+11vezpqamDut6VAgAgEKh6FLf6tWrodfrOxw7Jiamy/MSVcPW17t7Cmb5ru3uGdD1xt+1rjl27BjGjBnTbn+PCgGNRmP2r/3Tp69cP9y6R2DOrFmzMGnSJLN9Z86cQWVlJUaOHAkbG5vrM1kiom7U1NSEY8eOYcKECR3W9agQ8Pb2xvr169HS0mJyXqC8vBwAMGzYsHa31el00Ol07faHhoZev4kSEd0COtoDaNWjTgxPmTIF58+fx0cffWTSvnbtWri6usLPz6+bZkZE1DP1qD2BiRMn4qGHHkJ8fDx+/fVXeHh4YP369SgoKEBeXh4/I0BEZCGFJElSd0/CEufPn8eSJUuwYcMGnD59Gp6enkhKSsLjjz/e3VMjIupxelwIEBHR9dOjzgkQEdH1xRAgs1JTU6FQKHDgwAE88cQT6N+/P5ydnTFjxgz88ssvcp3RaERSUhIGDRqEvn37ws3NDXPmzMHZs2e7b/J0y9q1axcUCgXWr1/fpu+dd96BQqFAaWkpAGDfvn2YNGkS1Go1VCoVRo4ciQ0bNphs09jYiIULF2LQoEFQqVRQq9W47777zI5P5vWoE8N0802dOhXR0dF45plnUF5ejqSkJABATk4OJEnC5MmTUVRUhKSkJAQGBuL777/HsmXLsHv3buzevRtKpbKbV0C3ksDAQIwcORIGg0G+CWSrf/7znxg9ejRGjx6N4uJihIWFwc/PD6tWrUL//v3x/vvvIzo6Go2NjYiNjQUAvPDCC3j33XeRlpaGkSNH4sKFC6ioqOjw7gH0OxKRGcuWLZMASK+99ppJ++zZsyWVSiVdvnxZKigoMFvzwQcfSACkN99882ZOmXqINWvWSACkb775Rm7bu3evBEBau3atJEmS5OnpKY0cOVJqbm422TY8PFzS6XTSpUuXJEmSpGHDhkmTJ0++aXPvjXg4iDr0+09Z+/j4wGg04tSpU/jiiy8AQP6rrFVUVBTs7OxQVFR0s6ZJPcgTTzwBJycnGAwGuS0rKwsDBgxAdHQ0Dh8+jKqqKkyfPh0A0NLSIn89/PDDqK2txcGDBwEA999/P7Zv347ExESUlJT84X1yqC2GAHVIo9GYfN96eKepqQkNDQ2wsrLCgAEDTGoUCgVcXFy4S05mKZVKzJo1C+vWrcPZs2dRV1eHDRs2YObMmVAqlTh58iQAYOHChbC2tjb5mj17NgCgvr4eAPCPf/wDixcvxqZNmxASEgK1Wo3Jkyfjhx9+6Lb19TQ8J0BdptFo0NLSgrq6OpMgkCQJJ06cwOjRo7txdnQri4+PR2ZmJnJycmA0GtHS0oLnn38eAKDVagEASUlJiIyMNLv9kCFDAAB2dnbQ6/XQ6/U4efKkvFcQERGBqqqqm7OYHo4hQF0WGhqK1157DXl5eViwYIHc/tFHH+HChQu8HxO1S6fTISoqCtnZ2bh48SIiIiJw5513ArjyBj948GB89913yMjI6PSYzs7OiI2NxXfffYcVK1agsbERtra2N2oJvQZDgLrsoYcewoQJE7B48WL8+uuvGDNmjHx10MiRI/HUU0919xTpFjZ//nz5fl9r1qwx6Vu9ejUmTpyICRMmIDY2Fm5ubjh9+jQqKytRVlaGDz/8EADg5+eH8PBw+Pj4wNHREZWVlXj33XcREBDAAOgkhgB1mUKhwKZNm5Camoo1a9YgPT0dWq0WTz31FDIyMnh5KHXo/vvvx8CBA2FjY9NmrzEkJAR79+5Feno6EhIScObMGWg0Gnh5eeGxxx6T68aOHYstW7bg73//OxobG+Hm5oann34aS5YsudnL6bF42wgi6hbff/89hg8fDoPBIJ/wpZuPIUBEN9WRI0fw448/4uWXX0Z1dTUOHz7MQzfdiJeIEtFN9corr+Chhx7C+fPn8eGHHzIAuhn3BIiIBMY9ASIigTEEiIgExhAgIhIYQ4CISGAMAaIuUigUSE1Nlb8vKSmBQqFASUnJDXm948ePIzU1Fd9+++11Hzs3NxcKhQLHjh277mPTrY0hQHSdjBo1Crt378aoUaNuyPjHjx+HXq+/ISFA4uJtI6jXu1k3EuvXrx/8/f1v+OsQXU/cE6BepfXZyGVlZZg2bRocHR3h7u4OSZKQnZ2NESNGwMbGBo6Ojpg2bRqOHj1qsn1wcDCGDRuGXbt2wd/fHzY2NnBzc8PSpUtx6dKlDl+7vcNBe/bsQUREBDQaDVQqFdzd3ZGQkCD3Hz58GHFxcRg8eDBsbW3h5uaGiIgIlJeXm4zdemvuuLg4KBSKNoejOvNMXgD4+uuvMWbMGKhUKri6uiIpKQnNzc2d/AlTb8MQoF4pMjISHh4e+PDDD7Fq1SrMmjULCQkJGDduHDZt2oTs7GwcOHAAf/rTn+SHmLQ6ceIEHn/8cUyfPh2bN2/GtGnTkJaWhvnz51s8j8LCQgQGBqK6uhp/+9vfsH37diQnJ5u85vHjx6HRaJCZmYmCggIYDAZYWVnBz89PfoLWqFGj5DttJicny89wnjlzJgCguLgYY8aMwdmzZ7Fq1Sps3rwZI0aMQHR0NHJzc+XX+s9//oPQ0FCcPXsWubm5WLVqFb755hukpaVZvDbqJbrvyZZE11/rs5FTUlLktt27d0sApNdff92k9qeffpJsbGykRYsWyW1BQUESAGnz5s0mtc8++6zUp08f6ccff5TbAEjLli2Tvy8uLpYASMXFxXKbu7u75O7uLjU1NXV6DS0tLdLFixelwYMHSwsWLJDbS0tLJQDSmjVr2mzT2WfyRkdHSzY2NtKJEydMXs/T01MCIP33v//t9Dypd+CeAPVKU6dOlf+9detWKBQKxMTEmDyv1sXFBcOHD29z+Ob2229v82zlJ598EpcvX8bOnTs7PYdDhw7hyJEjeOaZZ6BSqdqta2lpQUZGBry8vNC3b19YWVmhb9+++OGHH1BZWfmHr2PJM3mLi4sRGhoKZ2dnefvbbrsN0dHRnV4X9S48MUy9kk6nk/998uRJSJJk8sZ3tbvvvtvke3N1Li4uAGDRc5Pr6uoAAHfccUeHdS+88AIMBgMWL16MoKAgODo6ok+fPpg5c2anHpx+9TN5Fy5caLam9Zm8DQ0N8lquZq6NxMAQoF5JoVDI/9ZqtVAoFNi1a5fZB938vu335wiAK+cJgCvPVe6s1ucu//zzzx3W5eXl4emnn27zKMX6+no4ODj84etY8kxejUYjr+Vq5tpIDAwB6vXCw8ORmZmJmpoak6dStefcuXPYsmWLySGhdevWoU+fPnjwwQc7/br33HMP3N3dkZOTgxdeeKHdJ60pFIo2fZ9++ilqamrg4eEht7XW/H7vwJJn8oaEhGDLli04efKkvMdz6dIlfPDBB51eF/UuDAHq9caMGYPnnnsOcXFx2LdvHx588EHY2dmhtrYWX375Jby9vREfHy/XazQaxMfHo7q6Gvfccw+2bduGf/3rX4iPj5cfht5ZBoMBERER8Pf3x4IFC3DnnXeiuroahYWFeO+99wBcCanc3Fx4enrCx8cH+/fvx1//+tc2h5Hc3d1hY2OD9957D0OHDoW9vT1cXV3h6ura6WfyJicnY8uWLRg7dixSUlJga2sLg8GACxcuXONPmXqs7j4zTXQ9tV4dVFdX16YvJydH8vPzk+zs7CQbGxvJ3d1devrpp6V9+/bJNUFBQdK9994rlZSUSPfdd5+kVColnU4nvfzyy22uvEEnrg6SpCtXJ02cOFHq37+/pFQqJXd3d5Orfs6cOSM988wzkpOTk2Rrays98MAD0q5du6SgoCApKCjIZKz169dLnp6ekrW1dZvX/+6776THHntMcnJykqytrSUXFxdp7Nix0qpVq0zG+Pe//y35+/tLSqVScnFxkV566SXpzTff5NVBguJDZYiuEhwcjPr6elRUVHT3VIhuCl4iSkQkMIYAEZHAeDiIiEhg3BMgIhIYQ4CISGAMASIigTEEiIgExhAgIhIYQ4CISGAMASIigTEEiIgExhAgIhLY/wF6c+l00BxujwAAAABJRU5ErkJggg==","text/plain":["<Figure size 400x300 with 1 Axes>"]},"metadata":{},"output_type":"display_data"}],"source":["#我们可以通过画图来查看这些被投放研究的可重复性比例。\n","article_sim['replicated'].value_counts().plot.bar()\n","plt.xticks(rotation=0)\n","plt.show()"]},{"cell_type":"markdown","metadata":{"_id":"A67F09F99463465D922E85F27AA61FAA","id":"781F44A7CAFA406FB210290BEC0285FA","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["3. 接下来我们需要模拟10000项研究使用确切语言风格的情况，  \n","- 和之前相同，不可重复研究使用确切语言风格的可能性为45% ，  \n","- 可重复研究使用确切语言风格的可能性为56% "]},{"cell_type":"code","execution_count":26,"metadata":{"_id":"EDE9730D549649EAA93AD55395737CD7","collapsed":false,"id":"747161B8FCE14DB68BB6BE5E322F597E","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[],"source":["# 设置条件概率\n","article_sim['data_model'] = np.where(article_sim['replicated'] == 'no', 0.45, 0.56)\n","\n","# 定义研究是否使用确切语言\n","data = ['certain', 'uncertain']\n","\n","# 设置随机种子，以便得到重复的结果\n","rng=np.random.default_rng(84735)\n","# 生成确切语言相关的数据\n","article_sim['language'] = article_sim.apply(lambda x: rng.choice(data, 1, p = [x.data_model, 1-x.data_model])[0], axis=1)"]},{"cell_type":"code","execution_count":30,"metadata":{"_id":"E1F09028FF044BED8FFB4AE3E0CB8988","collapsed":false,"id":"E5CA81617B1043C49D98F743BE5F9F15","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[{"data":{"text/html":["<div>\n","<style scoped>\n","    .dataframe tbody tr th:only-of-type {\n","        vertical-align: middle;\n","    }\n","\n","    .dataframe tbody tr th {\n","        vertical-align: top;\n","    }\n","\n","    .dataframe thead th {\n","        text-align: right;\n","    }\n","</style>\n","<table border=\"1\" class=\"dataframe\">\n","  <thead>\n","    <tr style=\"text-align: right;\">\n","      <th>replicated</th>\n","      <th>no</th>\n","      <th>yes</th>\n","    </tr>\n","    <tr>\n","      <th>language</th>\n","      <th></th>\n","      <th></th>\n","    </tr>\n","  </thead>\n","  <tbody>\n","    <tr>\n","      <th>certain</th>\n","      <td>2643</td>\n","      <td>2340</td>\n","    </tr>\n","    <tr>\n","      <th>uncertain</th>\n","      <td>3330</td>\n","      <td>1687</td>\n","    </tr>\n","  </tbody>\n","</table>\n","</div>"],"text/plain":["replicated    no   yes\n","language              \n","certain     2643  2340\n","uncertain   3330  1687"]},"execution_count":30,"metadata":{},"output_type":"execute_result"}],"source":["# 显示每个类别研究数量\n","(\n","  article_sim.groupby(['language', 'replicated'])\n","    .size()\n","    .unstack(fill_value=0)\n",")"]},{"cell_type":"markdown","metadata":{"_id":"DE820DB7BC9A4260A72A97A92D175958","id":"BCB4367AB90649378E518B8CFEB7CE3A","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["4. 计算后验值  \n","\n","还记得我们的先验概率为：  \n","- 可重复研究  $P(B)=0.4$  \n","- 不可重复性研究 $P(B^c)=0.6$,  \n","\n","由以上结果可计算似然：  \n","- 大约58.1%(2340/(2340+1687))的可重复性研究使用了确切语言, $P(A|B)=0.581$  \n","- 44.2%的不可重复性研究使用确切语言(2643/(3330+2643)), $P(A|B^c)=0.442$"]},{"cell_type":"markdown","metadata":{"_id":"491F9BD4FC4249B2A816ABEC734F72C5","id":"B3788583403F48FD8E9185AAFAEC0F06","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["结合先验和似然，我们可以进一步计算分母(边际概率)：  \n","- $L(B|A)*P(B) + L(B^{c}|A)*P(B^{c}) = 0.581*0.4 + 0.442*0.6 = 0.2324 + 0.2652 ≈ 0.498$  \n","\n","最后，我们可以计算得到的后验 (使用确切语言研究中，可重复性研究的概率)：  \n","- $P(B|A) ={\\frac{L(B|A)*P(B)}{P(A)}}= (0.581*0.4)/0.498 ≈ 0.467$  \n","- 在10000项研究中，使用确切语言的研究有4980篇(分母)  \n","- 而现在，我们可以知道，在使用确切语言的研究中，47%(2340/4980)的研究为可重复研究"]},{"cell_type":"code","execution_count":31,"metadata":{"_id":"463C518551694032AF27D25FB04F8AE7","collapsed":false,"id":"C0DE035866FD4AD39D7F980DB28AD3C5","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":["使用确切语言的研究 4983\n"]},{"data":{"text/plain":["replicated\n","no     2643\n","yes    2340\n","Name: count, dtype: int64"]},"execution_count":31,"metadata":{},"output_type":"execute_result"}],"source":["usage_yes = article_sim[article_sim['language'] == 'certain']\n","print('使用确切语言的研究', usage_yes['replicated'].value_counts().sum())\n","usage_yes['replicated'].value_counts()"]},{"cell_type":"markdown","metadata":{"_id":"4F9AD7BE225841788D37CEEDC528748B","id":"1569B8778BE44EF38DB13A131CBB1724","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["同样地，通过画图来可视化使用确切语言的研究的情况"]},{"cell_type":"code","execution_count":32,"metadata":{"_id":"0AEE39A369D24936AB824323E939DB11","collapsed":false,"id":"7F4F99A182204E39B39398C9AE8A7FA7","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[{"data":{"image/png":"iVBORw0KGgoAAAANSUhEUgAAA9gAAAHkCAYAAADFDYeOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAAA60ElEQVR4nO3de5iVZb038O/SwRkOKQ3jYdBUtqZEwE5JgdoqRCqpKGBkFhmoaeouLUvBE1Bibju8mi+2bbeNirJMRV/NNCWw2p6h3ECgG5XcuUHk4BFGQZ73jzYrxxlU9BkR+Hyuay4v7ue37ue+lzPrt77rWCmKoggAAADwlmy1sRcAAAAAmwMBGwAAAEogYAMAAEAJBGwAAAAogYANAAAAJRCwAQAAoAQCNgAAAJRAwAYAAIASCNgAAABQAgGbLdLkyZNTqVSycOHCjb0UNhN//vOfM378+Lf0O+X3EuDNcfu5ZdODeScRsAFK8Oc//zkTJkx4S4358MMPz913353GxsbyFgYAmzk9mHeSmo29AIBN2erVq1OpVEqZa/vtt8/2229fylwAsLnTg3kn8gw2/K/bb789Rx11VHbZZZfU1dVlzz33zMknn5ylS5c2qxs/fnwqlUrmzp2bY489Ntttt1123HHHHH/88XnmmWea1T799NM54YQTUl9fn06dOuXwww/Po48+mkqlkvHjx1frRo0ald13373Fmtad65UmTZqUAw88MDvssEM6duyYXr165ZJLLsnq1aub1RVFkYsuuii77bZb6urq8sEPfjC33357BgwYkAEDBjSrffbZZ/OVr3wl3bp1yzbbbJOdd945Z5xxRl544YUNvyLfgtbWlrS8fhYuXJhKpZJvfetb+c53vpNu3bqlU6dO6d+/f+65554Wl7/33nszZMiQdOnSJXV1ddljjz1yxhlnNKv5r//6r3zqU5/KDjvskNra2rzvfe/LpEmTmtXMmDEjlUolP/nJT3LmmWdm5513Tm1tbX7wgx9kxIgRSZKBAwemUqmkUqlk8uTJSd7471ZrL08bMGBAevbsmfvvvz8HHHBAOnTokH/4h3/IxRdfnLVr177xKxdgC6Ovb/y+nujBbHk8gw3/65FHHkn//v1z4oknZrvttsvChQvzne98J//0T/+U2bNnp127ds3qjz766BxzzDE54YQTMnv27IwdOzZJctVVVyVJ1q5dmyFDhuSBBx7I+PHjs+++++buu+/O4MGD3/I6P/WpT1Wb5oMPPpiJEydm/vz51XMnybnnnptvfOMbOemkkzJ8+PD893//d0488cSsXr06e+21V7Vu5cqVOeigg/LXv/4155xzTnr37p25c+fmggsuyOzZs3PHHXe85qPDa9eufUNNplKpZOutt35Le3+1SZMmpXv37rn00kuTJOeff34OO+ywPPbYY9luu+2SJLfddluGDBmS973vffnOd76TXXfdNQsXLsxvfvOb6jx//vOf86EPfSi77rprvv3tb2ennXbKbbfdli9+8YtZunRpxo0b1+y8Y8eOTf/+/fOv//qv2WqrrfLBD34wK1asyDnnnJNJkyZl3333TZLsscceSTb8d+vVFi9enE9/+tM588wzM27cuEydOjVjx45N165dc9xxx5V1dQJsVvT1jd/X9WC2SAVsgX74wx8WSYrHHnus1eNr164tVq9eXfzlL38pkhQ33nhj9di4ceOKJMUll1zS7DKnnnpqUVdXV6xdu7YoiqL41a9+VSQpvve97zWr+8Y3vlEkKcaNG1cd++xnP1vstttuLdax7lzr8/LLLxerV68ufvzjHxdbb711sXz58qIoimL58uVFbW1tccwxxzSrv/vuu4skxUEHHdRsPVtttVVx//33N6u99tpriyTFLbfcst7zv3KNr/fT2v5e7aCDDmq2tnVeff089thjRZKiV69exZo1a6rj9913X5GkuPrqq6tje+yxR7HHHnsUq1atWu95Dz300GKXXXYpnnnmmWbj//zP/1zU1dVVr9fp06cXSYoDDzywxRy//OUviyTF9OnTX3OPr/W71drv5UEHHVQkKe69995m8/To0aM49NBDX/NcAFsKff2gZut5p/R1PZgtkZeIw/9asmRJPv/5z+c973lPampq0q5du+y2225Jknnz5rWoP/LII5v9u3fv3mlqasqSJUuSJHfeeWeS5BOf+ESzumOPPfYtrfOPf/xjjjzyyHTp0iVbb7112rVrl+OOOy4vv/xyHn744STJPffckxdffLHFufv169fiJWs333xzevbsmQ984ANZs2ZN9efQQw9NpVLJjBkzXnM9J510Uu6///7X/bnpppve0r5bc/jhhzd79Lx3795Jkr/85S9JkocffjiPPPJITjjhhNTV1bU6R1NTU6ZNm5Zhw4alQ4cOza6Dww47LE1NTS1edn700Udv0Do39Hfr1Xbaaafsv//+zcZ69+5d3ScALenrG7ev68FsqbxEHPK3l0Mdcsgh+Z//+Z+cf/756dWrVzp27Ji1a9emX79+WbVqVYvLdOnSpdm/a2trk6Rau2zZstTU1KS+vr5Z3Y477vim1/n444/ngAMOyN57753LLrssu+++e+rq6nLffffltNNOa3bu9Z3r1WNPPvlkFixYsN6XSL36PUqvttNOO2WHHXZ43bWX9SEkr/R6/w+eeuqpJMkuu+yy3jmWLVuWNWvW5PLLL8/ll1/eas2rr4MN+YTRN/O79Wqv3mfyt72+kcsCbIn09Y3f1/VgtlQCNiSZM2dOHnzwwUyePDmf/exnq+MLFix403N26dIla9asyfLly5s148WLF7eoraury4svvthi/NVN5YYbbsgLL7yQ66+/vvroa5L86U9/anHu5G9N9tUWL17c7NHuhoaGtG/fvtn7vF6poaGh1fF1vva1r2XChAmvWZMku+222+t+fUZdXV2LD5RJXv/OwPqs+zTQv/71r+utefe7352tt946n/nMZ3Laaae1WtOtW7dm/96QBwva4ncLgNemr2/8vq4Hs6USsCF/v7Fe92j1OldeeeWbnvOggw7KJZdckl/84hc55ZRTquM///nPW9TuvvvuWbJkSZ588snqI9EvvfRSbrvtttddZ1EU+bd/+7dmdX379k1tbW1+8YtfZPjw4dXxe+65J3/5y1+aNeIjjjgiF110Ubp06dKiib0RJ510Uo444ojXrXv1ddua3XffPb/85S/z4osvVuuXLVuWu+66K9tuu+0Gr22vvfbKHnvskauuuipf/vKXW11Dhw4dMnDgwPzxj39M7969s80222zweZKWz3Ss0xa/WwC8Nn194/d1PZgtlYANSbp375499tgjY8aMSVEUqa+vz0033ZTbb7/9Tc85ePDgfPjDH86ZZ56ZZ599Nn369Mndd9+dH//4x0mSrbb6+0cgHHPMMbngggvyyU9+Ml/96lfT1NSU7373u3n55ZebzXnwwQdnm222ybHHHpuzzjorTU1N+d73vpcVK1Y0q6uvr8+Xv/zlfOMb38i73/3uDBs2LH/9618zYcKENDY2Njv3GWeckeuuuy4HHnhgvvSlL6V3795Zu3ZtHn/88fzmN7/JmWeemb59+653n127dk3Xrl3f9PX0Sp/5zGdy5ZVXZuTIkfnc5z6XZcuW5ZJLLnlT4XqdSZMmZciQIenXr1++9KUvZdddd83jjz+e2267LT/96U+TJJdddln+6Z/+KQcccEBOOeWU7L777nnuueeyYMGC3HTTTfntb3/7uufp2bNnkuT73/9+3vWud6Wuri7dunVrk98tAF6bvv7O6Ot6MFsiH3IGSdq1a5ebbrope+21V04++eQce+yxWbJkSe644443PedWW22Vm266KZ/85Cdz8cUX56ijjsrvf//7TJkyJUnSuXPnam23bt1y44035umnn87HP/7xfPWrX82IESNafP1D9+7dc91112XFihUZPnx4vvCFL+QDH/hAvvvd77Y4/8SJE3PhhRfmV7/6VY488sh897vfzfe+973ssMMOzc7dsWPH/P73v8+oUaPy/e9/P4cffng+8YlP5Lvf/W522WWXVr/Hs618+MMfzo9+9KPMnTs3Rx11VC688MKMHTu21e/GfqMOPfTQ/O53v0tjY2O++MUvZvDgwfna177W7D1rPXr0yKxZs9KzZ8+cd955OeSQQ3LCCSfk2muvzaBBg97Qebp165ZLL700Dz74YAYMGJD99tsvN910U5v8bgHw2vT1d0Zf14PZElWKoig29iJgS/Kzn/0sn/70p/Mf//Ef+dCHPvS2nvuxxx5L9+7dM27cuJxzzjlv67kBYHOkrwOvJGBDG7r66qvzxBNPpFevXtlqq61yzz335Jvf/Gb22Wef6td9tJUHH3wwV199dT70oQ9l2223zUMPPZRLLrkkzz77bObMmfOWPvUUALZE+jrwejb4JeLPPfdczjrrrBxyyCHZfvvtU6lUMn78+FZrZ82alY9+9KPp1KlTOnfunOHDh+fRRx9ttfbyyy9P9+7dU1tbm27dumXChAlZvXp1i7olS5Zk1KhRaWhoSIcOHdK/f/9Mmzat1TnvuOOO9O/fPx06dEhDQ0NGjRpV/S5DeDu8613vys9//vMcc8wxOeyww/Jv//ZvGTVqVJt8J/SrdezYMQ888EBOOOGEHHzwwTn33HOzzz775A9/+IMmDJs5vRrahr4OvK5iAz322GPFdtttVxx44IHFiSeeWCQpxo0b16Ju3rx5xbve9a7igAMOKH71q18V1113XfH+97+/6Nq1a7FkyZJmtRdeeGFRqVSKsWPHFtOnTy8uueSSYptttik+97nPNatramoqevbsWeyyyy7FlClTit/85jfFUUcdVdTU1BQzZsxoVjtjxoyipqamOOqoo4rf/OY3xZQpU4qdd9656NmzZ9HU1LSh2waATYZeDQAbxwYH7LVr1xZr164tiqIonnrqqfU27REjRhQNDQ3FM888Ux1buHBh0a5du+Kss86qji1durSoq6srTjrppGaXnzhxYlGpVIq5c+dWxyZNmlQkKe66667q2OrVq4sePXoU+++/f7PL77fffkWPHj2K1atXV8f+4z/+o0hSXHHFFRu6bQDYZOjVALBxbPBLxCuVyut+wfuaNWty88035+ijj2729Tq77bZbBg4cmKlTp1bHbr311jQ1NWX06NHN5hg9enSKosgNN9xQHZs6dWr23nvv9O/fvzpWU1OTkSNH5r777ssTTzyRJHniiSdy//335zOf+Uxqav7+TWQf+tCHstdeezU7PwBsbvRqANg42uRruh555JGsWrUqvXv3bnGsd+/eWbBgQZqampIkc+bMSZL06tWrWV1jY2MaGhqqx9fVrm/OJJk7d26zOddX+8o5AWBLpFcDQPlqXr9kwy1btixJUl9f3+JYfX19iqLIihUr0tjYmGXLlqW2tjYdO3ZstXbdXOvmXd+crzzv653/lXO+2qJFi7Jo0aJWj61YsSLz5s3LPvvsk/bt2693DgB4o1atWpWFCxfm0EMPTUNDw9t2Xr0aAN6YDenVbRKw13mtl6e98tgbrSur9rXmuPLKKzNhwoT1HgeAtjBlypR8+tOfftvPq1cDwBvzRnp1mwTsLl26JEmrjz4vX748lUolnTt3rtY2NTVl5cqV6dChQ4vaPn36NJt3fXMmf38U/PXO39qj5eucfPLJOfLII1s99qc//SknnHBCpkyZkve9733rnQMA3qh58+Zl5MiR2X333d/W8+rVAPDGbEivbpOAvccee6R9+/aZPXt2i2OzZ8/Onnvumbq6uiR/fz/X7Nmz07dv32rd4sWLs3Tp0vTs2bM61qtXr/XOmaRau+6/s2fPzmGHHdai9pVzvlpjY2MaGxtfc3/ve9/7su+++75mDQBsiLf75cx6NQBsmDfSq9vkQ85qamoyZMiQXH/99Xnuueeq448//nimT5+e4cOHV8cGDx6curq6TJ48udkckydPTqVSydChQ6tjw4YNy/z583PvvfdWx9asWZMpU6akb9++6dq1a5Jk5513zv77758pU6bk5Zdfrtbec889eeihh5qdHwC2RHo1AJTvTT2D/etf/zovvPBCtSH/+c9/zrXXXpskOeyww9KhQ4dMmDAh++23X4444oiMGTMmTU1NueCCC9LQ0JAzzzyzOld9fX3OO++8nH/++amvr88hhxyS+++/P+PHj8+JJ56YHj16VGuPP/74TJo0KSNGjMjFF1+cHXbYIVdccUUeeuih3HHHHc3W+C//8i85+OCDM2LEiJx66qlZsmRJxowZk549e7b4mhEA2Nzo1QCwEbyZL8/ebbfdiiSt/jz22GPVugceeKAYNGhQ0aFDh2Lbbbcthg4dWixYsKDVOS+77LJir732KrbZZpti1113LcaNG1e89NJLLeoWL15cHHfccUV9fX1RV1dX9OvXr7j99ttbnfM3v/lN0a9fv6Kurq6or68vjjvuuOLJJ598M1suiqIoZs6cWSQpZs6c+abnAIBXaqveolfr1QCUY0N6S6UoiuLtj/WbplmzZqVPnz6ZOXOm93UBUAq9pVyuTwDKtiG9pU3egw0AAABbGgEbAAAASiBgAwAAQAkEbAAAACiBgA0AAAAlELABAACgBAI2AAAAlEDABgAAgBII2AAAAFACARsAAABKIGADAABACQRsAAAAKIGADQAAACUQsAEAAKAEAjYAAACUQMAGAACAEgjYAAAAUAIBGwAAAEogYAMAAEAJBGwAAAAogYANAAAAJRCwAQAAoAQCNgAAAJRAwAYAAIASCNgAAABQAgEbAAAASiBgAwAAQAkEbAAAACiBgA0AAAAlELABAACgBAI2AAAAlEDABgAAgBII2AAAAFACARsAAABKIGADAABACQRsAAAAKIGADQAAACUQsAEAAKAEAjYAAACUQMAGAACAEgjYAAAAUAIBGwAAAEogYAMAAEAJBGwAAAAogYANAAAAJRCwAQAAoAQCNgAAAJRAwAYAAIASCNgAAABQAgEbAAAASiBgAwAAQAkEbAAAACiBgA0AAAAlELABAACgBAI2AAAAlEDABgAAgBII2AAAAFACARsAAABKIGADAABACQRsAAAAKIGADQAAACUQsAEAAKAEAjYAAACUQMAGAACAEgjYAAAAUAIBGwAAAEogYAMAAEAJBGwAAAAogYANAAAAJRCwAQAAoAQCNgAAAJRAwAYAAIAStGnA/uMf/5ihQ4ema9eu6dChQ7p3756vfe1rWblyZbO6WbNm5aMf/Wg6deqUzp07Z/jw4Xn00UdbnfPyyy9P9+7dU1tbm27dumXChAlZvXp1i7olS5Zk1KhRaWhoSIcOHdK/f/9MmzatTfYJAJsqvRoAytNmAfvPf/5zPvShD2XhwoW59NJLc/PNN+eTn/xkvva1r+XYY4+t1s2fPz8DBgzISy+9lGuuuSZXXXVVHn744RxwwAF56qmnms05ceLEnH766Rk+fHhuu+22nHrqqbnoooty2mmnNat78cUXM2jQoEybNi2XXXZZbrzxxuy4444ZPHhw7rzzzrbaMgBsUvRqAChZ0UbOPffcIkmxYMGCZuMnnXRSkaRYvnx5URRFMWLEiKKhoaF45plnqjULFy4s2rVrV5x11lnVsaVLlxZ1dXXFSSed1Gy+iRMnFpVKpZg7d251bNKkSUWS4q677qqOrV69uujRo0ex//77v+k9zZw5s0hSzJw5803PAQCvtDF7i14NAK9vQ3pLmz2D3a5duyTJdttt12y8c+fO2WqrrbLNNttkzZo1ufnmm3P00Udn2223rdbstttuGThwYKZOnVodu/XWW9PU1JTRo0c3m2/06NEpiiI33HBDdWzq1KnZe++9079//+pYTU1NRo4cmfvuuy9PPPFEmVsFgE2SXg0A5WqzgP3Zz342nTt3zimnnJJHH300zz33XG6++eZceeWVOe2009KxY8c88sgjWbVqVXr37t3i8r17986CBQvS1NSUJJkzZ06SpFevXs3qGhsb09DQUD2+rnZ9cybJ3LlzS9snAGyq9GoAKFdNW028++675+67786wYcOyxx57VMe/+MUv5tJLL02SLFu2LElSX1/f4vL19fUpiiIrVqxIY2Njli1bltra2nTs2LHV2nVzrZt3fXO+8rytWbRoURYtWtTqsXnz5q33cgCwqdGrAaBcbRawFy5cmCFDhmTHHXfMtddem+233z733ntvLrzwwjz//PP593//92ptpVJZ7zyvPPZG6za09pWuvPLKTJgwYb3HAWBzoVcDQLnaLGCPGTMmzz77bP70pz9VH8k+8MAD09DQkOOPPz7HHXdcdtpppyStP0q9fPnyVCqVdO7cOUnSpUuXNDU1ZeXKlenQoUOL2j59+lT/3aVLl/XOmbT+KPw6J598co488shWj82bNy8jR458jV0DwKZDrwaAcrVZwP7Tn/6UHj16tHiZ2H777Zfkb++9+vCHP5z27dtn9uzZLS4/e/bs7Lnnnqmrq0vy9/dzzZ49O3379q3WLV68OEuXLk3Pnj2rY7169VrvnEma1b5aY2NjGhsb3+g2AWCTpVcDQLna7EPOunbtmrlz5+b5559vNn733XcnSXbZZZfU1NRkyJAhuf766/Pcc89Vax5//PFMnz49w4cPr44NHjw4dXV1mTx5crP5Jk+enEqlkqFDh1bHhg0blvnz5+fee++tjq1ZsyZTpkxJ375907Vr1xJ3CgCbJr0aAErWVt8VduONNxaVSqXo169f8Ytf/KKYNm1aMXHixKJTp05Fjx49ihdffLEoiqKYN29e0alTp+LAAw8sbrnlluL6668vevbsWXTt2rVYsmRJszkvvPDColKpFOecc04xY8aM4pvf/GZRW1tbfO5zn2tW19TUVLz//e8v3vOe9xQ//elPi9tvv70YNmxYUVNTU8yYMeNN78l3awJQto3ZW/RqAHh9G9Jb2ixgF0VR/Pa3vy0OOeSQYqeddirat29f7LXXXsWZZ55ZLF26tFndAw88UAwaNKjo0KFDse222xZDhw4tFixY0Oqcl112WbHXXnsV22yzTbHrrrsW48aNK1566aUWdYsXLy6OO+64or6+vqirqyv69etX3H777W9pP5o2AGXb2L1FrwaA17YhvaVSFEWx8Z4/37TMmjUrffr0ycyZM7Pvvvtu7OUAsBnQW8rl+gSgbBvSW9rsPdgAAACwJRGwAQAAoAQCNgAAAJRAwAYAAIASCNgAAABQAgEbAAAASiBgAwAAQAkEbAAAACiBgA0AAAAlELABAACgBAI2AAAAlEDABgAAgBII2AAAAFACARsAAABKIGADAABACQRsAAAAKIGADQAAACUQsAEAAKAEAjYAAACUQMAGAACAEgjYAAAAUAIBGwAAAEogYAMAAEAJBGwAAAAogYANAAAAJRCwAQAAoAQCNgAAAJRAwAYAAIASCNgAAABQAgEbAAAASiBgAwAAQAkEbAAAACiBgA0AAAAlELABAACgBAI2AAAAlEDABgAAgBII2AAAAFACARsAAABKIGADAABACQRsAAAAKIGADQAAACUQsAEAAKAEAjYAAACUQMAGAACAEgjYAAAAUAIBGwAAAEogYAMAAEAJBGwAAAAogYANAAAAJRCwAQAAoAQCNgAAAJRAwAYAAIASCNgAAABQAgEbAAAASiBgAwAAQAkEbAAAACiBgA0AAAAlELABAACgBAI2AAAAlEDABgAAgBII2AAAAFACARsAAABKIGADAABACQRsAAAAKIGADQAAACUQsAEAAKAEAjYAAACUQMAGAACAEgjYAAAAUAIBGwAAAEogYAMAAEAJBGwAAAAoQZsH7D/84Q857LDD8u53vzvt27fPe9/73nz9619vVjNr1qx89KMfTadOndK5c+cMHz48jz76aKvzXX755enevXtqa2vTrVu3TJgwIatXr25Rt2TJkowaNSoNDQ3p0KFD+vfvn2nTprXJHgFgU6ZXA0A52jRg/+xnP8tBBx2U7bbbLj/+8Y9zyy235Oyzz05RFNWa+fPnZ8CAAXnppZdyzTXX5KqrrsrDDz+cAw44IE899VSz+SZOnJjTTz89w4cPz2233ZZTTz01F110UU477bRmdS+++GIGDRqUadOm5bLLLsuNN96YHXfcMYMHD86dd97ZllsGgE2KXg0AJSrayF//+teiY8eOxSmnnPKadSNGjCgaGhqKZ555pjq2cOHCol27dsVZZ51VHVu6dGlRV1dXnHTSSc0uP3HixKJSqRRz586tjk2aNKlIUtx1113VsdWrVxc9evQo9t9//ze9p5kzZxZJipkzZ77pOQDglTZmb9GrAeD1bUhvabNnsH/wgx/khRdeyNlnn73emjVr1uTmm2/O0UcfnW233bY6vttuu2XgwIGZOnVqdezWW29NU1NTRo8e3WyO0aNHpyiK3HDDDdWxqVOnZu+9907//v2rYzU1NRk5cmTuu+++PPHEEyXsEAA2bXo1AJSrzQL27373u9TX12f+/Pn5wAc+kJqamuywww75/Oc/n2effTZJ8sgjj2TVqlXp3bt3i8v37t07CxYsSFNTU5Jkzpw5SZJevXo1q2tsbExDQ0P1+Lra9c2ZJHPnzi1nkwCwCdOrAaBcNW018RNPPJGVK1dmxIgRGTt2bC699NLcf//9GTduXObMmZPf//73WbZsWZKkvr6+xeXr6+tTFEVWrFiRxsbGLFu2LLW1tenYsWOrtevmSpJly5atd851x9dn0aJFWbRoUavH5s2b99qbBoBNiF4NAOVqs4C9du3aNDU1Zdy4cRkzZkySZMCAAdlmm21yxhlnZNq0aenQoUOSpFKprHeeVx57o3UbWvtKV155ZSZMmLDe4wCwudCrAaBcbfYS8S5duiRJDj300GbjH/vYx5L87es+1tW09ij18uXLU6lU0rlz5+p8TU1NWblyZau1r3wUvEuXLuudM2n9Ufh1Tj755MycObPVnylTprzWlgFgk6JXA0C52uwZ7N69e+eee+5pMV7879d+bLXVVtljjz3Svn37zJ49u0Xd7Nmzs+eee6auri7J39/PNXv27PTt27dat3jx4ixdujQ9e/asjvXq1Wu9cyZpVvtqjY2NaWxsfCNbBIBNml4NAOVqs2ewjz766CTJr3/962bjt9xyS5KkX79+qampyZAhQ3L99dfnueeeq9Y8/vjjmT59eoYPH14dGzx4cOrq6jJ58uRm802ePDmVSiVDhw6tjg0bNizz58/PvffeWx1bs2ZNpkyZkr59+6Zr165lbRMANll6NQCUrC2/L2zIkCFFbW1t8fWvf724/fbbi2984xtFXV1dccQRR1Rr5s2bV3Tq1Kk48MADi1tuuaW4/vrri549exZdu3YtlixZ0my+Cy+8sKhUKsU555xTzJgxo/jmN79Z1NbWFp/73Oea1TU1NRXvf//7i/e85z3FT3/60+L2228vhg0bVtTU1BQzZsx40/vx3ZoAlG1j9xa9GgBe24b0ljYN2CtXrizOPvvs4j3veU9RU1NT7LrrrsXYsWOLpqamZnUPPPBAMWjQoKJDhw7FtttuWwwdOrRYsGBBq3NedtllxV577VVss802xa677lqMGzeueOmll1rULV68uDjuuOOK+vr6oq6urujXr19x++23v6X9aNoAlG1j9xa9GgBe24b0lkpR/O8brXhds2bNSp8+fTJz5szsu+++G3s5AGwG9JZyuT4BKNuG9JY2ew82AAAAbEkEbAAAACiBgA0AAAAlELABAACgBAI2AAAAlEDABgAAgBII2AAAAFACARsAAABKIGADAABACQRsAAAAKIGADQAAACUQsAEAAKAEAjYAAACUQMAGAACAEgjYAAAAUAIBGwAAAEogYAMAAEAJBGwAAAAogYANAAAAJRCwAQAAoAQCNgAAAJRAwAYAAIASCNgAAABQAgEbAAAASiBgAwAAQAkEbAAAACiBgA0AAAAlELABAACgBAI2AAAAlEDABgAAgBII2AAAAFACARsAAABKIGADAABACQRsAAAAKIGADQAAACUQsAEAAKAEAjYAAACUQMAGAACAEgjYAAAAUAIBGwAAAEogYAMAAEAJBGwAAAAogYANAAAAJRCwAQAAoAQCNgAAAJRAwAYAAIASCNgAAABQAgEbAAAASiBgAwAAQAkEbAAAACiBgA0AAAAlELABAACgBAI2AAAAlEDABgAAgBII2AAAAFACARsAAABKIGADAABACQRsAAAAKIGADQAAACUQsAEAAKAEAjYAAACUQMAGAACAEgjYAAAAUAIBGwAAAEogYAMAAEAJBGwAAAAogYANAAAAJRCwAQAAoAQCNgAAAJRAwAYAAIASCNgAAABQAgEbAAAASvC2Buwf/OAHqVQq6dSpU4tjs2bNykc/+tF06tQpnTt3zvDhw/Poo4+2Os/ll1+e7t27p7a2Nt26dcuECROyevXqFnVLlizJqFGj0tDQkA4dOqR///6ZNm1a6fsCgM2BPg0Ab83bFrCfeOKJfOUrX0nXrl1bHJs/f34GDBiQl156Kddcc02uuuqqPPzwwznggAPy1FNPNaudOHFiTj/99AwfPjy33XZbTj311Fx00UU57bTTmtW9+OKLGTRoUKZNm5bLLrssN954Y3bccccMHjw4d955Z5vuFQA2Nfo0AJSgeJscccQRxZAhQ4rPfvazRceOHZsdGzFiRNHQ0FA888wz1bGFCxcW7dq1K84666zq2NKlS4u6urripJNOanb5iRMnFpVKpZg7d251bNKkSUWS4q677qqOrV69uujRo0ex//77v6k9zJw5s0hSzJw5801dHgBe7Z3SWzaHPl0U75zrE4DNx4b0lrflGewpU6bkzjvvzBVXXNHi2Jo1a3LzzTfn6KOPzrbbblsd32233TJw4MBMnTq1Onbrrbemqakpo0ePbjbH6NGjUxRFbrjhhurY1KlTs/fee6d///7VsZqamowcOTL33XdfnnjiiRJ3CACbLn0aAMrR5gF7yZIlOeOMM3LxxRdnl112aXH8kUceyapVq9K7d+8Wx3r37p0FCxakqakpSTJnzpwkSa9evZrVNTY2pqGhoXp8Xe365kySuXPnvvlNAcBmQp8GgPLUtPUJTj311Oy999455ZRTWj2+bNmyJEl9fX2LY/X19SmKIitWrEhjY2OWLVuW2tradOzYsdXadXOtm3d9c77yvK+2aNGiLFq0qNVj8+bNa3UcADZVm1qfTvRqAN652jRgX3fddbnpppvyxz/+MZVK5TVrX+v4K4+90boNrV3nyiuvzIQJE9Z7OQDYXGyKfTrRqwF452qzgP3888/ntNNOyxe+8IV07do1Tz/9dJLkpZdeSpI8/fTTadeuXbp06ZKk9Ueqly9fnkqlks6dOydJunTpkqampqxcuTIdOnRoUdunT5/qv7t06bLeOZPWH4lPkpNPPjlHHnlkq8fmzZuXkSNHvsauAWDTsKn26USvBuCdq80C9tKlS/Pkk0/m29/+dr797W+3OP7ud787Rx11VK699tq0b98+s2fPblEze/bs7Lnnnqmrq0vy9/d0zZ49O3379q3WLV68OEuXLk3Pnj2rY7169VrvnEma1b5SY2NjGhsbN2CnALDp2VT7dKJXA/DO1WYfcrbTTjtl+vTpLX4OPfTQ1NXVZfr06bnwwgtTU1OTIUOG5Prrr89zzz1Xvfzjjz+e6dOnZ/jw4dWxwYMHp66uLpMnT252rsmTJ6dSqWTo0KHVsWHDhmX+/Pm59957q2Nr1qzJlClT0rdv31a/5xMAthT6NACUr82ewa6rq8uAAQNajE+ePDlbb711s2MTJkzIfvvtlyOOOCJjxoxJU1NTLrjggjQ0NOTMM8+s1tXX1+e8887L+eefn/r6+hxyyCG5//77M378+Jx44onp0aNHtfb444/PpEmTMmLEiFx88cXZYYcdcsUVV+Shhx7KHXfc0VbbBoBNgj4NAOV7W74H+/V07949M2bMSLt27fLxj388o0aNyp577pnf/e532X777ZvVnnvuubn00ktz7bXX5pBDDsnll1+eMWPGZNKkSc3qamtrM23atAwcODBf+MIXMmTIkCxatCi//vWvc9BBB72d2wOATZo+DQBvTKUoimJjL2JTMWvWrPTp0yczZ87Mvvvuu7GXA8BmQG8pl+sTgLJtSG95RzyDDQAAAJs6ARsAAABKIGADAABACQRsAAAAKIGADQAAACUQsAEAAKAEAjYAAACUQMAGAACAEgjYAAAAUAIBGwAAAEogYAMAAEAJBGwAAAAogYANAAAAJRCwAQAAoAQCNgAAAJRAwAYAAIASCNgAAABQAgEbAAAASiBgAwAAQAkEbAAAACiBgA0AAAAlqNnYC9iS7T7mVxt7CVC18OLDN/YSAABgk+YZbAAAACiBgA0AAAAlELABAACgBAI2AAAAlEDABgAAgBII2AAAAFACARsAAABKIGADAABACQRsAAAAKIGADQAAACUQsAEAAKAEAjYAAACUQMAGAACAEgjYAAAAUAIBGwAAAEogYAMAAEAJBGwAAAAogYANAAAAJRCwAQAAoAQCNgAAAJRAwAYAAIASCNgAAABQAgEbAAAASiBgAwAAQAkEbAAAACiBgA0AAAAlELABAACgBAI2AAAAlKBmYy8AAOCdavcxv9rYS4CqhRcfvrGXALwOz2ADAABACQRsAAAAKIGADQAAACUQsAEAAKAEAjYAAACUQMAGAACAEgjYAAAAUAIBGwAAAEogYAMAAEAJBGwAAAAogYANAAAAJRCwAQAAoAQ1G3sBAADA5mH3Mb/a2EuAZhZefPjbej7PYAMAAEAJBGwAAAAogYANAAAAJRCwAQAAoAQCNgAAAJRAwAYAAIASCNgAAABQgjYL2L/97W9z/PHHp3v37unYsWN23nnnHHXUUZk5c2aL2lmzZuWjH/1oOnXqlM6dO2f48OF59NFHW5338ssvT/fu3VNbW5tu3bplwoQJWb16dYu6JUuWZNSoUWloaEiHDh3Sv3//TJs2rfR9AsCmSq8GgHK1WcD+3ve+l4ULF+b000/PLbfckssuuyxLlixJv3798tvf/rZaN3/+/AwYMCAvvfRSrrnmmlx11VV5+OGHc8ABB+Spp55qNufEiRNz+umnZ/jw4bntttty6qmn5qKLLsppp53WrO7FF1/MoEGDMm3atFx22WW58cYbs+OOO2bw4MG5884722rLALBJ0asBoFw1bTXxpEmTssMOOzQbGzx4cPbcc89cdNFF+chHPpIkueCCC1JbW5ubb7452267bZKkT58+ee9735tvfetb+Zd/+ZckybJly3LhhRfmc5/7XC666KIkyYABA7J69eqcd955OeOMM9KjR48kyb//+79nzpw5ueuuu9K/f/8kycCBA/OP//iPOeuss3Lvvfe21bYBYJOhVwNAudrsGexXN+wk6dSpU3r06JH//u//TpKsWbMmN998c44++uhqw06S3XbbLQMHDszUqVOrY7feemuampoyevToZnOOHj06RVHkhhtuqI5NnTo1e++9d7VhJ0lNTU1GjhyZ++67L0888URZ2wSATZZeDQDlels/5OyZZ57JrFmz8v73vz9J8sgjj2TVqlXp3bt3i9revXtnwYIFaWpqSpLMmTMnSdKrV69mdY2NjWloaKgeX1e7vjmTZO7cueVsCAA2M3o1ALx5bfYS8dacdtppeeGFF3Luuecm+dtLyZKkvr6+RW19fX2KosiKFSvS2NiYZcuWpba2Nh07dmy1dt1c6+Zd35yvPG9rFi1alEWLFrV6bN68ea+xOwDY9OnVAPDmvW0B+/zzz89Pf/rTXH755enTp0+zY5VKZb2Xe+WxN1q3obWvdOWVV2bChAnrPQ4Amyu9GgDemrclYE+YMCEXXnhhJk6cmH/+53+ujnfp0iVJ649SL1++PJVKJZ07d67WNjU1ZeXKlenQoUOL2lfeEejSpct650xafxR+nZNPPjlHHnlkq8fmzZuXkSNHrveyALCp0qsB4K1r84A9YcKEjB8/PuPHj88555zT7Ngee+yR9u3bZ/bs2S0uN3v27Oy5556pq6tL8vf3c82ePTt9+/at1i1evDhLly5Nz549q2O9evVa75xJmtW+WmNjYxobGzdghwCwadOrAaAcbfohZ1//+tczfvz4nHfeeRk3blyL4zU1NRkyZEiuv/76PPfcc9Xxxx9/PNOnT8/w4cOrY4MHD05dXV0mT57cbI7JkyenUqlk6NCh1bFhw4Zl/vz5zb7iY82aNZkyZUr69u2brl27lrdJANiE6dUAUJ42ewb729/+di644IIMHjw4hx9+eO65555mx/v165fkb4+a77fffjniiCMyZsyYNDU15YILLkhDQ0POPPPMan19fX3OO++8nH/++amvr88hhxyS+++/P+PHj8+JJ55Y/V7NJDn++OMzadKkjBgxIhdffHF22GGHXHHFFXnooYdyxx13tNWWAWCTolcDQLnaLGDfdNNNSf72nZi33npri+NFUSRJunfvnhkzZuTss8/Oxz/+8dTU1OQjH/lIvvWtb2X77bdvdplzzz0373rXuzJp0qR861vfyk477ZQxY8ZUP+l0ndra2kybNi1nnXVWvvCFL2TlypX5wAc+kF//+tc56KCD2mjHALBp0asBoFxtFrBnzJjxhmv79Onzhh+t/uIXv5gvfvGLr1u344475kc/+tEbXgMAbGn0agAoV5u+BxsAAAC2FAI2AAAAlEDABgAAgBII2AAAAFACARsAAABKIGADAABACQRsAAAAKEGbfQ82QNl2H/Orjb0EaGbhxYdv7CUAAO8gnsEGAACAEgjYAAAAUAIBGwAAAEogYAMAAEAJBGwAAAAogYANAAAAJRCwAQAAoAQCNgAAAJRAwAYAAIASCNgAAABQAgEbAAAASiBgAwAAQAkEbAAAACiBgA0AAAAlELABAACgBAI2AAAAlEDABgAAgBII2AAAAFACARsAAABKIGADAABACQRsAAAAKIGADQAAACUQsAEAAKAEAjYAAACUQMAGAACAEgjYAAAAUAIBGwAAAEogYAMAAEAJBGwAAAAogYANAAAAJRCwAQAAoAQCNgAAAJRAwAYAAIASCNgAAABQAgEbAAAASiBgAwAAQAkEbAAAACiBgA0AAAAlELABAACgBAI2AAAAlEDABgAAgBII2AAAAFACARsAAABKIGADAABACQRsAAAAKIGADQAAACUQsAEAAKAEAjYAAACUQMAGAACAEgjYAAAAUAIBGwAAAEogYAMAAEAJBGwAAAAogYANAAAAJRCwAQAAoAQCNgAAAJRAwAYAAIASCNgAAABQAgEbAAAASiBgAwAAQAkEbAAAACiBgA0AAAAlELDZpK15fnme/sNPs+b55Rt7KbBJ8DcDvN3c7sCG8TezaROw2aS9/PzyPPMfV+dlN0DwhvibAd5ubndgw/ib2bRt1gH7+eefzxlnnJGuXbumrq4uH/jAB/Lzn/98Yy8LAPhfejUAm5Oajb2AtjR8+PDcf//9ufjii7PXXnvlZz/7WY499tisXbs2n/rUpzb28gBgi6dXA7A52WwD9i233JLbb7+92qiTZODAgfnLX/6Sr371qznmmGOy9dZbb+RVAsCWS68GYHOz2b5EfOrUqenUqVNGjBjRbHz06NH5n//5n9x7770baWUAQKJXA7D52WwD9pw5c/K+970vNTXNn6Tv3bt39TgAsPHo1QBsbjbbl4gvW7Ys//AP/9BivL6+vnq8NYsWLcqiRYtaPfanP/0pSTJv3rxS1vji4gWlzLMlW73sv5v9lzdv1qxZG3sJr8vfzFvnb6ZcZfzdrOspq1atestzbWr06i2D253y6NVbBn8z5Xq7e/VmG7CTpFKpbPCxK6+8MhMmTHjNeUeOHPmW1kX5lt387Y29hE1enx9t7BXwdvI3U44y/24WLlyYD3/4w+VNuInQq7ccbnfeOr16y+Jvphxvd6/ebAN2ly5dWn3ke/nyv32f3LpHx1/t5JNPzpFHHtnqsRUrVmTevHnZZ5990r59+/IWC8AWa9WqVVm4cGEOPfTQjb2Ut51eDcCmYEN69WYbsHv16pWrr746a9asafbertmzZydJevbs2erlGhsb09jYuN55Bw0aVO5CAdjibYnPXCd6NQCbjjfaqzfbDzkbNmxYnn/++Vx33XXNxn/0ox+la9eu6du370ZaGQCQ6NUAbH4222ewP/axj+Xggw/OKaeckmeffTZ77rlnrr766tx6662ZMmWK79UEgI1MrwZgc1MpiqLY2ItoK88//3zOPffcXHPNNVm+fHm6d++esWPH5pOf/OTGXhoAEL0agM3LZh2wAQAA4O2y2b4HGwAAAN5OAjbvGL///e9TqVRy9dVXtzj24x//OJVKJffff3+S5IEHHsiRRx6Z+vr61NXVZZ999sk111zT7DIrV67MV77ylXTr1i11dXWpr6/PBz/4wVbnh83J+PHjU6lUMnfu3Bx77LHZbrvtsuOOO+b444/PM888U61ramrK2LFj061bt2yzzTbZeeedc9ppp+Xpp5/eeIsHNjluc2D93L/d8my2H3LGpueAAw7IPvvsk0mTJuXYY49tduz//t//m/322y/77bdfpk+fnsGDB6dv377513/912y33Xb5+c9/nmOOOSYrV67MqFGjkiRf/vKX85Of/CQXXnhh9tlnn7zwwguZM2dOq9+5Cpujo48+Osccc0xOOOGEzJ49O2PHjk2SXHXVVSmKIkOHDs20adMyduzYHHDAAfnP//zPjBs3LnfffXfuvvvu1NbWbuQdAJsStznQkvu3W6AC3kF++MMfFkmKP/7xj9Wx++67r0hS/OhHPyqKoii6d+9e7LPPPsXq1aubXfaII44oGhsbi5dffrkoiqLo2bNnMXTo0Ldt7fBOMW7cuCJJcckllzQbP/XUU4u6urpi7dq1xa233tpqzS9+8YsiSfH973//7VwysAlzmwOvzf3bLYuXiPOOcuyxx2aHHXbIpEmTqmOXX355tt9++xxzzDFZsGBB5s+fn09/+tNJkjVr1lR/DjvssCxatCgPPfRQkmT//ffPr3/964wZMyYzZszIqlWrNsqeYGM58sgjm/27d+/eaWpqypIlS/Lb3/42SaqPiK8zYsSIdOzYMdOmTXu7lglsJtzmQOvcv92yCNi8o9TW1ubkk0/Oz372szz99NN56qmncs011+TEE09MbW1tnnzyySTJV77ylbRr167Zz6mnnpokWbp0aZLku9/9bs4+++zccMMNGThwYOrr6zN06ND813/910bbH7ydunTp0uzf615+uWrVqixbtiw1NTXZfvvtm9VUKpXstNNOXmoGbDC3OdA692+3LN6DzTvOKaeckosvvjhXXXVVmpqasmbNmnz+859PkjQ0NCRJxo4dm+HDh7d6+b333jtJ0rFjx0yYMCETJkzIk08+WX20b8iQIZk/f/7bsxl4h+rSpUvWrFmTp556qtkd3qIosnjx4uy3334bcXXA5sZtDls692+3HAI27ziNjY0ZMWJErrjiirz00ksZMmRIdt111yR/u3F573vfmwcffDAXXXTRG55zxx13zKhRo/Lggw/m0ksvzcqVK9OhQ4e22gK84w0aNCiXXHJJpkyZki996UvV8euuuy4vvPBCBg0atBFXB2xu3OawpXP/dsshYPOOdPrpp6dv375Jkh/+8IfNjl155ZX52Mc+lkMPPTSjRo3KzjvvnOXLl2fevHmZNWtWfvnLXyZJ+vbtmyOOOCK9e/fOu9/97sybNy8/+clP0r9/fzc+bPEOPvjgHHrooTn77LPz7LPP5sMf/nD1E3332WeffOYzn9nYSwQ2I25zwP3bLYWAzTvS/vvvn9133z3t27dv8aj2wIEDc99992XixIk544wzsmLFinTp0iU9evTIJz7xiWrdRz7ykfy///f/8n/+z//JypUrs/POO+e4447Lueee+3ZvB95xKpVKbrjhhowfPz4//OEPM3HixDQ0NOQzn/lMLrroIl+XA5TKbQ64f7ulqBRFUWzsRcCr/ed//mf+8R//MZMmTap+uAMAAGyq3L/dMgjYvKM88sgj+ctf/pJzzjknjz/+eBYsWODlLgAAbLLcv92y+Jou3lG+/vWv5+CDD87zzz+fX/7yl258AADYpLl/u2XxDDYAAACUwDPYAAAAUAIBGwAAAEogYAMAAEAJBGwAAAAogYANAAAAJRCwAQAAoAQCNgAAAJRAwAYAAIASCNgAAABQgv8PJftddQOLGRAAAAAASUVORK5CYII=","text/plain":["<Figure size 1000x500 with 2 Axes>"]},"metadata":{},"output_type":"display_data"}],"source":["# 定义两幅图的坐标\n","fig, axes = plt.subplots(1, 2, figsize=(10, 5))\n","\n","# 绘制两幅图\n","for i, u in enumerate(article_sim['language'].unique()):\n","    ax = axes[i]\n","    data = article_sim[article_sim['language'] == u]\n","    ax.bar(data['replicated'].unique(), data['replicated'].value_counts())\n","    ax.set_title(f'language = {u}')\n","    ax.set_ylim(0, 10000) \n","\n","# 显示   \n","fig.tight_layout()\n","plt.show()"]},{"cell_type":"markdown","metadata":{"_id":"E8021516E6B64E539FC5521887F739E6","id":"306BCD1F559942D480905787A22618F5","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["### 总结 (Recap)  \n","\n","回到之前的问题：如何预测研究的可重复性？  \n","\n","\n","\n","![Image Name](https://cdn.kesci.com/upload/sjq96jfv8f.png?imageView2/0/w/960/h/960)  \n","\n","\n","\n","- 哪些信念可以作为先验概率？  \n","- 信息的哪些属性可以作为数据？  \n","- 如何结合先验和数据更新信念 (贝叶斯公式)。  "]},{"cell_type":"markdown","metadata":{"_id":"9116A7CE222F4BDE8B8E3607286E0966","id":"473E9272AD19403FA89171CAEDA88987","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["## Part3 随机变量的贝叶斯模型"]},{"cell_type":"markdown","metadata":{"_id":"E3B392ED97F84AE2A0B122A3581A9C4E","id":"0F2FAAC7BE214EE69C2E2464F502D7EF","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["**随机变量 (random variables)**  \n","\n","在之前的分析中，我们讨论的是对某项研究的“可重复性”这样一个单一事件。  \n","\n","同样的逻辑可以应用于更加抽象和一般性的**随机变量**进行分析。  \n","\n","假设为了研究可重复性问题，一个有能力且资金充足的研究团队计划进行一系列可重复性实验，他们希望知道这些实验成功重复的比例是多少。  \n","\n","首先我们来了解一个概念，胜率或成功率  \n","\n","* 想象你玩斗地主，有五局三胜，七局四胜这一说，一轮玩下来，就会出现胜率。  \n","* 然而，胜率并不是一成不变的，它会随着每次游戏的输赢而变化。  \n","* 在每一轮开始前，你并不会知道你这次的胜率是多少  "]},{"cell_type":"markdown","metadata":{"_id":"3FD06461908142E698C9D59BBF2E0D18","id":"EE3DB18593334B1E91AA270536C49BA4","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["在我们的例子中，假设计划对6项研究进行重复实验。  \n","- 假设该团队对于任何研究成功复现的成功率为$\\pi$，$\\pi$是**未知的且可能会变化**，所以$\\pi$是一个随机变量。  \n","- 根据团队先前的经验以及心理学研究的现状，我们猜测其成功复现的成功率为 $\\pi = 50\\%$。  \n","- 他接下来可能成功复现的次数$Y$可能是0，可能是1，也可能是6，可以有7种可能的成功复现次数，$Y \\in \\{0,1,2,3,4,5,6\\}$ "]},{"cell_type":"markdown","metadata":{"_id":"5097DA54E0F443E7A8E87A5310133378","id":"6EF130B8CC954D59BE31831CDC4A9B24","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["🤔虽然我们知道他们的平均成功率为 $\\pi = 50\\%$，但问题在于，对于每一种复现成功的次数（1 ～ 6），其可能性分别是多少呢？ "]},{"cell_type":"markdown","metadata":{"_id":"713A6C44D59C4EDAB7EE7E6B0770503F","id":"608DF357F3834E9790FD8BD15B955F14","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["**二项式模型**  \n","\n","由于每次重复实验，结果只有两种可能：成功 vs 失败。  \n","\n","该团队总共进行6次重复实验，我们想要知道的是成功1次，成功2次，成功3次，...，的概率。  \n","\n","对于这种情况，我们可以用二项分布来分析。  \n"]},{"cell_type":"markdown","metadata":{"_id":"D03769BD6D2B4C808F77AC844D213504","id":"8F06AC4C2F334EECA836658FE02193E2","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["该团队的成功率为$\\pi$，$在\\pi$下某成功次数发生的概率可表示为：  \n","\n","$$  \n","f(y|\\pi) = \\binom{n}{y} \\pi^{y}(1-\\pi)^{n-y} \\quad\\quad for\\;y \\in \\{0,1,2,...,n\\}  \n","$$  \n","$$  \n","\\binom{n}{y} = \\frac{n!}{y!(n-y)!}  \n","$$  \n","\n","$\\pi$ 表示成功的可能性，$y$表示在$n$个试次中成功的次数，二项模型含有的前提假设是：  \n","\n","(1) 所有试次发生都是相互独立的  \n","\n","(2) 在每个试次中，成功的概率都是一个固定的值$\\pi$  "]},{"cell_type":"markdown","metadata":{"_id":"B192E3FF5B4F439391B43E16A0B1A2A7","id":"EFF1D40190FB4DB1B4EFA466A4FFD1DA","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["成功次数为0~6 的可能性可以分别写成：  \n","\n","$$  \n","f(Y=0|\\pi=0.5) = \\binom{6}{0} 0.5^0 (1-0.5)^{6}  \n","$$  \n","$$  \n","f(Y=1|\\pi=0.5) = \\binom{6}{1} 0.5^1 (1-0.5)^{5}  \n","$$  \n","$$  \n","...  \n","$$  \n","$$  \n","f(Y=5|\\pi=0.5) = \\binom{6}{5} 0.5^{5} (1-0.5)^{1}  \n","$$  \n","$$  \n","f(Y=6|\\pi=0.5) = \\binom{6}{6} 0.5^{6} (1-0.5)^{0}  \n","$$  "]},{"cell_type":"markdown","metadata":{"_id":"8D0F65AFF8784B9AA6D5DDC0D0B530D8","id":"02AA0477EC3C4C51BA8136C40D904DAF","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["我们可以使用代码帮助计算  \n"," `st.binom.pmf(y, n, p)`。其中 p 对应公式中的 $\\pi$。"]},{"cell_type":"code","execution_count":14,"metadata":{"_id":"9B6ABDBAC2184D748BC14210FEC09798","collapsed":false,"id":"1905BE68168F4B1F91B4BCDB8BCF072D","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[],"source":["# 导入数据加载和处理包：pandas\n","import pandas as pd\n","# 导入数字和向量处理包：numpy\n","import numpy as np\n","# 导入基本绘图工具：matplotlib\n","import matplotlib.pyplot as plt\n","# 导入高级绘图工具 seaborn 为 sns\n","import seaborn as sns\n","# 导入统计建模工具包 scipy.stats 为 st\n","import scipy.stats as st \n","\n","# 设置APA 7的画图样式\n","plt.rcParams.update({\n","    'figure.figsize': (4, 3),      # 设置画布大小\n","    'font.size': 12,               # 设置字体大小\n","    'axes.titlesize': 12,          # 标题字体大小\n","    'axes.labelsize': 12,          # 轴标签字体大小\n","    'xtick.labelsize': 12,         # x轴刻度字体大小\n","    'ytick.labelsize': 12,         # y轴刻度字体大小\n","    'lines.linewidth': 1,          # 线宽\n","    'axes.linewidth': 1,           # 轴线宽度\n","    'axes.edgecolor': 'black',     # 设置轴线颜色为黑色\n","    'axes.facecolor': 'white',     # 轴背景颜色（白色）\n","    'xtick.direction': 'in',       # x轴刻度线向内\n","    'ytick.direction': 'out',      # y轴刻度线向内和向外\n","    'xtick.major.size': 6,         # x轴主刻度线长度\n","    'ytick.major.size': 6,         # y轴主刻度线长度\n","    'xtick.minor.size': 4,         # x轴次刻度线长度（如果启用次刻度线）\n","    'ytick.minor.size': 4,         # y轴次刻度线长度（如果启用次刻度线）\n","    'xtick.major.width': 1,        # x轴主刻度线宽度\n","    'ytick.major.width': 1,        # y轴主刻度线宽度\n","    'xtick.minor.width': 0.5,      # x轴次刻度线宽度（如果启用次刻度线）\n","    'ytick.minor.width': 0.5,      # y轴次刻度线宽度（如果启用次刻度线）\n","    'ytick.labelleft': True,       # y轴标签左侧显示\n","    'ytick.labelright': False      # 禁用y轴标签右侧显示\n","})"]},{"cell_type":"code","execution_count":15,"metadata":{"_id":"964344C1AE4D492EBEE44169E13DAD6F","collapsed":false,"id":"AB4FBD69A1374FE08CBDA061A61CE18D","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[],"source":["y = [0,1,2,3,4,5,6]  # 成功次数 \n","n = 6                # 重复研究总次数\n","p = 0.5              # 假设的成功概率\n","\n","# 计算概率值\n","prob = st.binom.pmf(y, n, p)\n","\n","result_table = pd.DataFrame({\"成功次数\":y, \"概率\":prob})\n","result_table"]},{"cell_type":"markdown","metadata":{"_id":"0E0EE993EC7C4246864A50147A8E2736","id":"7B1C1AB3B7BC492FBBB18D52225998C3","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["显然，当团队的成功概率为 0.5 时，其在六次研究中获得 *y* = 3 次成功的概率最高(*p* = 0.3125)。"]},{"cell_type":"code","execution_count":16,"metadata":{"_id":"41D47C047ABC4817902F877AC116E071","collapsed":false,"id":"B2770EEE190242F6B3C59D9C9932A5F7","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[],"source":["# 绘制灰色竖线\n","for i, j in zip(y , prob):\n","    plt.plot([i, i], [j, 0], 'gray', linestyle='-', linewidth=1, zorder=1, )\n","\n","# 绘制黑色点(各成功率次数的成功率)\n","plt.scatter(y, prob, c='black')\n","\n","plt.ylabel('$f(y|\\pi)$')\n","plt.xlabel('y')\n","\n","plt.xlim(-0.2,6.2)\n","plt.ylim(0,0.5)\n","plt.show()"]},{"cell_type":"markdown","metadata":{"_id":"8041DF2125114C788A52694D4B45784E","id":"30B1E04A446B4992A48DA9028E30F7AE","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["\n","**概率质量函数(probability mass function, pmf)：** 用来描述离散型随机变量在各特定取值上的概率  \n","\n","在上图中我们看到，成功次数y在不同的取值上的概率不同。  \n","\n","* 由于$y$的个数是有限的，并且是随机发生的，我们把$y$称为离散型随机变量，而$y$发生的概率$f(y)$则被称为概率质量函数  \n","\n","\n","对于离散型随机变量$Y$，$Y$各取值的概率由$f(y)$指定：  \n","$$  \n","f(y) = P(Y=y)  \n","$$  \n","\n","并且有如下性质：  \n","\n","* 对所有y的取值来说，$0\\leq f(y) \\leq 1$  \n","* $\\sum_{all\\,\\pmb{y}}f(y) = 1$，y取值的所有概率之和为1"]},{"cell_type":"code","execution_count":17,"metadata":{"_id":"05FC709EDC61405080FAEF84FC525F6E","collapsed":false,"id":"734046F659A140CCAA1B6780E29C48FB","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[],"source":["sum(result_table['概率'])"]},{"cell_type":"markdown","metadata":{"_id":"AA8EDB178D4B4E89A7544F8DE8489F30","id":"DF201A487DBB411A89E51928FEF0B232","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["### 二项似然函数(The Binomial likelihood function)"]},{"cell_type":"markdown","metadata":{"_id":"49B4DD60B3174E9297B696DAEC178564","id":"A52AEDE06598430085D53474BED4E44A","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["**不同的信念**  \n","\n","虽然我们认为该团队重复6个实验的成功率是50%，但并非所有人都这么认为。    \n","\n","- 乐观派认为该团队的成功概率为 0.8，表示对实验成功复现持高度信心。  \n","- 悲观派则认为该团队的成功概率仅为 0.2，意味着对实验成功复现不太乐观。  \n","\n","成功的概率影响着他们对研究复现结果的预期：如果团队的成功概率高，那么6次研究中成功复现的次数会更多；  \n","反之，如果成功概率低，那么研究复现的失败次数就会更多。  \n","\n","我们可以计算持不同信念的人心中，该团队在6项研究中成功复现的次数的概率分布并画图。"]},{"cell_type":"code","execution_count":18,"metadata":{"_id":"09496DE9C3554C5899C273188217941F","collapsed":false,"id":"D75FE92EF3A246269A9B589AC0BC7762","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[],"source":["y = [0,1,2,3,4,5,6]  # 成功次数 \n","n = 6                # 研究总次数\n","\n","# 计算似然值\n","p = 0.5              # 本团队假设的成功复现概率\n","likelihood1 = st.binom.pmf(y, n, p)\n","p = 0.8              # 乐观派的成功概率\n","likelihood2 = st.binom.pmf(y, n, p)\n","p = 0.2              # 悲观派眼中的成功概率\n","likelihood3 = st.binom.pmf(y, n, p)\n","\n","result_table = pd.DataFrame({\n","  \"成功次数\":y, \n","  \"本团队(pi=0.5)\":likelihood1, \n","  \"悲观派(pi=0.2)\":likelihood2, \n","  \"乐观派(pi=0.8)\":likelihood3})\n","result_table"]},{"cell_type":"code","execution_count":19,"metadata":{"_id":"DC62F21B049746FDA1E4D7D3062A8A9F","collapsed":false,"id":"CD0C1F54D18F402B98231092EE4D0ADB","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[],"source":["# 创建子图\n","fig, axs = plt.subplots(1, 3, figsize=(10, 4))\n","\n","# 绘制三个图,每个子图类似原图\n","three_pi = [\"Team itself ($\\pi = 0.5$)\",\"Optimists ($\\pi = 0.8$)\",\"Pessimists ($\\pi = 0.2$)\"]\n","likelihoods = [likelihood1, likelihood2, likelihood3]\n","for i, ax in enumerate(axs):\n","    \n","    ax.scatter(y, likelihoods[i], c='black')\n","    \n","    for xx, yy in zip(y, likelihoods[i]):\n","        ax.plot([xx, xx], [yy, 0], 'gray', linestyle='-', linewidth=1, zorder=1)\n","    \n","    # 添加facet\n","    ax.set_title(three_pi[i])\n","\n","    ax.set_xlim(-0.2,6.2)\n","    ax.set_ylim(0,0.4)\n","\n","fig.supylabel('$f(y|\\pi)$')\n","fig.supxlabel('y')\n","plt.tight_layout()\n","plt.show()"]},{"cell_type":"markdown","metadata":{"_id":"ECAB551A37B54158AC2B5734427CAD1E","id":"82230022EA50413AAFD002E5BFB55B70","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["显然，对于乐观派来说，团队取得六次成功的概率远高于其他成功次数。而对于悲观派来说，团队全败的可能性远高于其他成功次数。  \n","- 换句话说，若团队在6项研究中仅成功复现一次，这种情况在低成功率下(悲观派设想的情境)更可能出现，在高成功率下(乐观派设想的情境)几乎不可能出现。  \n","- 那么团队成功重复的成功率率，更可能(likelihood)是悲观派设想的那样($\\pi = 0.2$)。  \n","\n","例如，在乐观派和悲观派眼中(不同成功率$\\pi$下)，6项研究只成功1次的可能性(即似然，likelihood)。  "]},{"cell_type":"code","execution_count":20,"metadata":{"_id":"44F114058962412AAEEF8D6DA01A1FB8","collapsed":false,"id":"E16BAF3537A646D697966C6B0B29AB11","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[],"source":["# 定义成功次数和研究总次数\n","y = 1  # 成功次数，作为数组处理以便向量化计算\n","n = 6  # 研究总次数\n","\n","# 计算似然值，对于三种不同的成功概率 p\n","p_values = [0.5, 0.8, 0.2]          # 定义三种成功率\n","likelihoods = []                    # 用于存储每种成功率的似然值结果\n","\n","for p in p_values:\n","    likelihood = st.binom.pmf(y, n, p)  # 使用st.binom.pmf计算似然值\n","    likelihoods.append(likelihood)      \n","\n","\n","# 创建图形和子图\n","fig, ax = plt.subplots()  \n","ax.scatter(p_values, likelihoods, c='black')\n","# 设置x轴和y轴的限制应该在绘制线条之前完成，以避免重复设置\n","ax.set_xlim(-0.2, 1.2)  # x轴范围根据p_values调整，最大不应超过1\n","ax.set_ylim(0, 0.5)\n","for xx, yy in zip(p_values, likelihoods):\n","    ax.plot([xx, xx], [0, yy], 'gray', linestyle='-', linewidth=1, zorder=1)\n","    # 注意这里的顺序是 [0, yy] 而不是 [yy, 0]，因为我们希望从x轴画到对应的似然值\n","# 设置坐标轴标签，直接使用ax的方法，而不是fig的方法\n","ax.set_ylabel('$f(\\pi|y)$')  \n","ax.set_xlabel('$\\pi$')       \n","plt.tight_layout()  # 调整布局以避免标签重叠\n","plt.show()"]},{"cell_type":"markdown","metadata":{"_id":"72AC4CDC23224E4D922F9C883E946616","id":"CA27669176924C0CA2E9F06C49C416DF","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["**似然函数**  \n","\n","当团队只成功复现一次时，该事件在不同成功率下出现的可能性可以写为：  \n","\n","$$  \n","f(Y=1|\\pi=0.2) = \\binom{6}{1} 0.2^1 (1-0.2)^{5}  \n","$$  \n","$$  \n","f(Y=1|\\pi=0.5) = \\binom{6}{1} 0.5^1 (1-0.5)^{5}  \n","$$  \n","$$  \n","f(Y=1|\\pi=0.8) = \\binom{6}{1} 0.8^1 (1-0.8)^{5}  \n","$$  \n","\n","因此，成功复现次数为1时的似然函数可以写成  \n","\n","$$  \n","L(\\pi|y=1) = f(y=1|\\pi) = \\binom{6}{1} \\pi^{1}(1-\\pi)^{6-1} = 6\\pi(1-\\pi)^{5}  \n","$$  \n","\n","不同成功率下的似然：  \n","\n","| $\\pi$          | 0.2   | 0.5   | 0.8   |  \n","|---------------|-------|-------|-------|  \n","| $L(\\pi \\| y=1)$ | 0.3932 | 0.0938 | 0.0015 |  \n","\n","\n","\n","\n","**注意：**  \n","\n","似然函数表示的是，在各种可能的成功率$\\pi$下,成功次数$Y=1$的可能性，所以  \n","1. 该似然函数公式只取决于$\\pi$  \n","2. 似然函数的总和加起来不为1（从条件概率的公式来看，似然函数的分母是不同的）"]},{"cell_type":"markdown","metadata":{"_id":"D0F1C949052B4862A5FF3F952B56246C","id":"44277785C6124A7D8CC7F49DB1E35396","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["#### 条件概率 VS 似然函数  \n","\n","🤓  \n","当$\\pi$是一定时，条件概率质量函数$f(·|\\pi)$可以帮我们计算在$\\pi$取值下（各种模型），不同的数据$Y$(e.g., $y_{1},y_{2}$)发生的可能性。  \n","$$  \n","f(y_{1}|\\pi) \\; vs \\; f(y_{2}|\\pi)  \n","$$  \n","\n","当$Y = y$一定时，似然函数$L(·|y)= f(y|·)$允许我们比较在各种不同的模型，即二项式的$\\pi$取值(e.g., $\\pi_{1},\\pi_{2}$)下，观察到这个数据$y$的可能性(relative likelihood)。  \n","\n","\n","$$  \n","L(\\pi_{1}|y) \\; \\text{与} \\; L(\\pi_{2}|y)  \n","$$  \n","$$  \n","\\text{即}  \n","$$  \n","$$  \n","f(y|\\pi_{1}) \\; \\text{与} \\; f(y|\\pi_{2})  \n","$$  \n"]},{"cell_type":"markdown","metadata":{"_id":"37CA7578F4774A4A80CE7B13B1DCFBE0","id":"11A59BC5521C4320962FCEDCD9835765","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["**在二项分布模型下...**  \n","\n","进行$n = 6$个重复实验时，成功次数与成功率的关系符合二项式模型，可以用如下的形式来表示：  \n","\n","$$  \n","Y|\\pi \\sim Bin(6,\\pi)  \n","$$  \n","\n","\n","$$  \n","f(y|\\pi) = \\binom{6}{y} \\pi^{y}(1-\\pi)^{6-y} \\quad\\quad for\\;y \\in \\{0,1,2,3,4,5,6\\}  \n","$$  \n","\n","-----------------------------------  \n","\n","下图给出了几种 $\\pi$的取值，我们可以通过概率模型得到每种$Y$发生的可能性。  \n","* 同时，我们可以看到，Y=1(赢一次)这一特定的数据模式，在各个$\\pi$取值(模型)下的似然。  "]},{"cell_type":"code","execution_count":21,"metadata":{"_id":"880BEABE08684E0098F7AF48EAAF80F3","collapsed":false,"id":"199B46EB9B9B4E0E862610D24A587EA8","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[],"source":["# Values for Y (number of successes in n trials)\n","y = np.arange(0, 7)\n","# Number of trials (n) and different probabilities (π values)\n","n = 6\n","pi_values = [0.2, 0.5, 0.8]\n","\n","# Create subplots\n","fig, axs = plt.subplots(1, 3, figsize=(12, 5))\n","# Loop over each pi value to plot the corresponding binomial distribution\n","for i, pi in enumerate(pi_values):\n","    # Calculate binomial probabilities for each y (number of successes)\n","    likelihoods = st.binom.pmf(y, n, pi)\n","    \n","    # Scatter plot of the likelihoods\n","    axs[i].scatter(y, likelihoods, color='black', zorder=2)\n","    \n","    # Draw gray vertical lines\n","    for yy, likelihood in zip(y, likelihoods):\n","        axs[i].plot([yy, yy], [0, likelihood], color='gray', linestyle='-', linewidth=1, zorder=1)\n","    \n","    # Highlight y = 1 with a black line\n","    axs[i].plot([1, 1], [0, st.binom.pmf(1, n, pi)], color='black', linewidth=3, zorder=3)\n","    \n","    # Title with binomial parameters\n","    axs[i].set_title(f'Bin({n},{pi})')\n","    \n","    # Set y and x axis limits\n","    axs[i].set_xlim(-0.5, 6.5)\n","    axs[i].set_ylim(0, 0.5)\n","\n","# Global labels\n","fig.supylabel(r'$f(y|\\pi)$')\n","fig.supxlabel('y')\n","\n","# Adjust layout for better fit\n","plt.tight_layout()\n","plt.show()"]},{"cell_type":"markdown","metadata":{"_id":"45541A93BE7B4F15B21B9284010EE053","id":"CEA95B2883354165B950F960D06CB637","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["### 先验概率模型(**Prior** probability model)"]},{"cell_type":"markdown","metadata":{"_id":"8F39E31E1E4D40299FC0523F74F844E3","id":"FAE5F28DE44446C4A399FEA40B7C8EDE","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["**建立先验模型**  \n","\n","从前面的描述可以看到，二项分布的参数$\\pi$也可以变化，也可以成为一个随机变量。  \n","\n","- 例如，我们想当一个更有深度的观察者，融合了乐观派、悲观派和中立者三者关于$\\pi$的估计。  \n","- 但是，我们对三种观点的可能性有不同的信念。  \n","\n","假如我们总体上是一个乐观派，但不排除悲观派的观点，我们给三派观点分配了一定的概率(先验)。  \n","- 例如，设定 $\\pi_{0.2} = 0.1$， 或者 $\\pi_{0.2} = 0.5$。 但需要所有$f(\\pi)$的总和为1。  \n","\n","\n","| $\\pi$\t    |0.2  |0.5 |0.8 |Total  \n","|---------- |-----|----|----|-----|  \n","|$f(\\pi)$   |0.10  |0.25 |0.65   |1|  \n","\n"]},{"cell_type":"markdown","metadata":{"_id":"C4F7FE361F3B4A6F9838D276D5CE8611","id":"B552B6DCD27D4CBA955B5A3C454D03A9","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["我们设定的$\\pi$ 的数量也是可以变化的。  \n","\n","- 例如，我们还可以将一种非常悲观的可能性也纳入进来，认为该团队成功率为0.01，即 $\\pi = 0.01$。  \n","- 那么新形成的先验分布可能如下。  \n","\n"," \n","| $\\pi$    |   0.01  | 0.2  | 0.5  | 0.8  | Total |  \n","| -------- | --- | ---- | ---- | ---- | ----- |  \n","| $f(\\pi)$ |  0.10   | 0.10 | 0.25 | 0.55 | 1     |  \n","\n","\n"]},{"cell_type":"markdown","metadata":{"_id":"38E98221461549329E95D8DA03AD6D66","id":"1C61EB98A68648C891934A6F50E534DF","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["### 后验概率模型(Posterior probability model)"]},{"cell_type":"markdown","metadata":{"_id":"64BEDA6DE1DD49E18C1669669168502C","id":"3B67DC97ABE94D7E8B77894796FB3602","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["前述第一个先验模型，我们总体上是乐观的，认为团队高成功率的可能性很高 ($\\pi_{0.8} = 0.65$)。  \n","\n","\n","\n","| $\\pi$\t    |0.2  |0.5 |0.8 |Total  \n","|---------- |-----|----|----|-----|  \n","|$f(\\pi)$   |0.10  |0.25 |0.65   |1|  \n","\n","\n","\n","然而，最终结果发现：该团队只成功复现一次。  \n","\n","这个新的数据会如何改变我们的信念？"]},{"cell_type":"markdown","metadata":{"_id":"6E5372D39F5D48D58461B9A071A46D9C","id":"99BD84BB6ED14F57B09254FE00FAE21D","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["**后验模型的计算过程**  \n","\n","上图所表示的后验可写成：  \n","\n","$$  \n","f(\\pi|y=1)  \n","$$  \n","\n","表示当团队只赢成功复现一项研究时，他成功率$\\pi$的概率分布  \n","\n","根据贝叶斯公式，我们可以进一步对后验概率公式进行展开：  \n","\n","$$  \n","posterior = \\frac{ prior*likelihood} {normalizing\\;\\;constant}  \n","$$  \n","\n","$$  \n","f(\\pi|y=1) = \\frac{ f(\\pi)L(\\pi|y=1)} {f(y=1)} \\quad\\quad for\\;\\pi \\in {0.2,0.5,0.8}  \n","$$  \n","\n","$$  \n","f(\\pi=0.2|y=1) = \\frac{0.10 \\times 0.3932} {0.0637} \\approx 0.617  \n","$$  \n","$$  \n","f(\\pi=0.5|y=1) = \\frac{0.25 \\times 0.0938} {0.0637} \\approx 0.368  \n","$$  \n","$$  \n","f(\\pi=0.8|y=1) = \\frac{0.65 \\times 0.0015} {0.0637} \\approx 0.015  \n","$$ \n","\n"]},{"cell_type":"markdown","metadata":{"_id":"8436F519D4D24363BC40B88942193A50","id":"EA407E43E22F46FBB36F3A6E8A9E9C60","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["下表对后验概率模型进行了总结，我们可知，经过了先前只成功了一项研究的复现经历后，该团队取得成功($\\pi$=0.8)的可能性已经从0.65降到了0.015  \n","\n","\n","| $\\pi$\t        |0.2    |0.5    |0.8 |Total  \n","|---------------|-----  |----   |----|-----|  \n","|$f(\\pi)$   |0.10  |0.25 |0.65|1|  \n","|$f(\\pi \\| y=1)$   |0.617  |0.368 |0.015|1|  \n"]},{"cell_type":"markdown","metadata":{"_id":"08F23756A40C45C19D184E7F115C95E0","id":"98A72E08C66047AAAB79BCF043DDAE90","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["**补充材料**  \n","\n","省略分母的计算  \n","- 在贝叶斯统计中，后验概率的计算公式是：\n","\n","$$\n","P(\\pi|y) = \\frac{P(y|\\pi) \\cdot P(\\pi)}{P(y)} \\propto P(y|\\pi) \\cdot P(\\pi)\n","$$\n","\n","- **$P(\\pi|y)$** 是后验概率。\n","- **$P(y|\\pi)$** 是似然函数。\n","- **$P(\\pi)$** 是先验概率。\n","- 分母 **$P(y)=\\int P(y|\\pi) \\cdot P(\\pi) d \\theta$** 是边际似然。\n","\n","由于分母 **$P(y)$**（边际似然）在不同参数 **$\\pi$** 下是一个常数，因此在比较不同后验概率时，我们可以省略这个常数，因为它对所有的后验概率是相同的。\n"]},{"cell_type":"markdown","id":"a9daafd4","metadata":{},"source":["\n","因此，虽然我们计算的值并不总和为 1，但它们的比例关系没有改变，仍然可以用于比较不同参数下的后验概率。这就是为什么我们可以只关注 **$P(y|\\pi) \\cdot P(\\pi)$** 而不需要计算完整的后验概率。\n","\n","那么既然$f(y)$是一个用来标准化的常数，它并不受$\\pi$的影响，那么后验概率质量函数$f(\\pi|y)$ 就与$f(\\pi)$和$L(\\pi|y)$成正比  \n","\n","$$  \n","f(\\pi | y) = \\frac{f(\\pi)L(\\pi|y)}{f(y)} \\propto f(\\pi)L(\\pi|y)  \n","$$  \n","即，  \n","\n","$$  \n","posterior \\propto prior⋅ likelihood  \n","$$  \n","\n","省略分母后验的计算可写成：  \n","$$  \n","f(\\pi=0.2|y=1) \\propto 0.10⋅0.3932  =0.039320  \n","$$  \n","\n","$$  \n","f(\\pi=0.5|y=1) \\propto 0.25⋅0.0938 = 0.023450  \n","$$  \n","$$  \n","f(\\pi=0.8|y=1) \\propto 0.65⋅0.0015 = 0.000975  \n","$$  \n","\n","$\\propto$ 表示成比例，尽管这些未经标准化的后验概率总和不等于1  \n","$$  \n","0.039320 + 0.023450 + 0.000975 = 0.063745,  \n","$$  \n","但它们的比例关系并未改变(见下图)  "]},{"cell_type":"markdown","id":"42135bc0","metadata":{},"source":["**Proportionality**  \n","\n","既然$f(y)$是一个用来标准化的常数，它并不受$\\pi$的影响，那么后验概率质量函数$f(\\pi|y)$ 就与$f(\\pi)$和$L(\\pi|y)$成正比  \n","\n","$$  \n","f(\\pi | y) = \\frac{f(\\pi)L(\\pi|y)}{f(y)} \\propto f(\\pi)L(\\pi|y)  \n","$$  \n","即，  \n","\n","$$  \n","posterior \\propto prior⋅ likelihood  \n","$$  \n","\n","> 😜这个性质很重要。因为分母的计算量往往比较大，需要遍历所有参数，如果参数不止一个，计算量可想而知。因此，如过能不计算分母也能计算后验，那么这样的方法(后面会介绍的MCMC算法)将会非常有实践意义。"]},{"cell_type":"code","execution_count":23,"metadata":{"_id":"E2DC4ED06D5D468495F62290EAE2A323","collapsed":false,"id":"2E96EDB9D4F64E258E1620D7D81A0633","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[],"source":["# Pi values and corresponding unnormalized posterior data\n","pi_values = [0.2, 0.5, 0.8]\n","unnormalized_posterior = [0.03932, 0.02345, 0.00097]  # Unnormalized posterior values\n","normalized_posterior = [val / sum(unnormalized_posterior) for val in unnormalized_posterior]  # Normalized\n","\n","# Create subplots for normalized and unnormalized posteriors\n","fig, axs = plt.subplots(1, 2, figsize=(8, 4))\n","\n","# Normalized posterior\n","axs[0].scatter(pi_values, normalized_posterior, color='black', zorder=2)\n","for xx, yy in zip(pi_values, normalized_posterior):\n","    axs[0].plot([xx, xx], [0, yy], color='black', linewidth=3, zorder=1)\n","axs[0].set_title('Normalized $f(\\pi | y=1)$')\n","axs[0].set_xlim(0.15, 0.85)\n","axs[0].set_ylim(0, 0.7)\n","\n","# Unnormalized posterior\n","axs[1].scatter(pi_values, unnormalized_posterior, color='black', zorder=2)\n","for xx, yy in zip(pi_values, unnormalized_posterior):\n","    axs[1].plot([xx, xx], [0, yy], color='black', linewidth=3, zorder=1)\n","axs[1].set_title('Unnormalized $f(\\pi | y=1)$')\n","axs[1].set_xlim(0.15, 0.85)\n","axs[1].set_ylim(0, 0.05)\n","\n","# Set labels and layout\n","for ax in axs:\n","    ax.set_xlabel(r'$\\pi$')\n","\n","fig.supylabel('Probability')\n","plt.tight_layout()\n","plt.show()"]},{"cell_type":"markdown","metadata":{"_id":"039AEF64DD0447A8AB7E4DD44248D1EF","id":"E423D4DA6065413883F1D73C795C23D9","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["我们可以使用这些未经标准化的后验概率总和作为分母，来对后验概率进行标准化，会得到相同的计算结果。  \n","\n","$$  \n","f(\\pi = 0.2 | y = 1) = \\frac{0.039320}{0.039320 + 0.023450 + 0.000975} \\approx 0.617  \n","$$  \n","\n","注意，分母为所有似然值的总和，因此后验概率的计算公式还可以写成：  \n","\n","$$  \n","f(\\pi | y) = \\frac{f(\\pi)L(\\pi|y)}{f(y)} = \\frac{f(\\pi)L(\\pi|y)}{\\sum_{\\text{all } \\pi} f(\\pi)L(\\pi|y)} .  \n","$$"]},{"cell_type":"markdown","metadata":{"_id":"F4A3D160AE5E4181A769E34E023A7033","id":"EA2EB8F21DC54AF5A0AED9C66A000F39","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["### Posterior simulation (with code)"]},{"cell_type":"markdown","metadata":{"_id":"744BDB3097544121BB3C1C271A5EEEE0","id":"115C8CB90FFC4BBB98035C8F8E4301C7","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["### 1. 定义先验模型  \n","- 定义多个可能的成功率  \n","- 定义每个成功率出现的可能性 (注意，其和为1)"]},{"cell_type":"code","execution_count":24,"metadata":{"_id":"AE14DA7DBF2E4C4AAC85A3A7A09A81B6","collapsed":false,"id":"E68BBC4D52BA43FC9BBA149F645B5DA0","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[],"source":["import pandas as pd\n","import numpy as np\n","\n","# 定义可能的成功率\n","replicated = pd.DataFrame({'pi':[0.2, 0.5, 0.8]})\n","\n","# 定义先验模型\n","prior = [0.10, 0.25, 0.65]"]},{"cell_type":"markdown","metadata":{"_id":"A5593998CBBE463084B2DB31E612BAEB","id":"7AEDC92C0D58482DAE485896D4BB3845","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["### 2. 模拟在特定成功率下，6项研究中的成功次数  \n","- 重复这个过程10000次"]},{"cell_type":"code","execution_count":25,"metadata":{"_id":"EB13D3BBC8AA444EB4751606BAE3948F","collapsed":false,"id":"AA582F8D251044BF9D8D1A8DD4FFCFBE","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[],"source":["# 设置随机数种子保证可重复性\n","np.random.seed(84735)\n","\n","# 从先验中抽取10000个 pi 值，并生成对应的y值\n","\n","replicated_sim = replicated.sample(n=10000, weights=prior, replace=True)\n","replicated_sim['y'] = np.random.binomial(n=6, p=replicated_sim['pi'], size=len(replicated_sim))\n","replicated_sim.head(10)"]},{"cell_type":"code","execution_count":26,"metadata":{"_id":"178104DAE86849BE8666A37AA8AB472E","collapsed":false,"id":"3897FD3D995F4A9E9A22E4DDD683964D","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[],"source":["#对pi的抽取情况进行总结\n","replicated_counts =  replicated_sim['pi'].value_counts().reset_index()\n","\n","replicated_counts.columns = ['pi','n']\n","\n","replicated_counts['percentage'] = (replicated_counts['n']/len(replicated_sim))\n","\n","replicated_counts = replicated_counts.sort_values(by='pi')\n","\n","print(replicated_counts)"]},{"cell_type":"markdown","metadata":{"_id":"AD3A5819F20D461597DD4193AF4D8A37","id":"69E4FCCFDB814FC394C8BA36278F111F","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["### 3.  不同成功率下，不同成功次数的分布情况$f(y|\\pi)$"]},{"cell_type":"code","execution_count":27,"metadata":{"_id":"0F1BFF8B7EB74713A32B5E4C20A98CFA","collapsed":false,"id":"2A2ADF4EAA834FF9A71D3F4819047CE9","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[],"source":["# 导入绘图工具 seaborn\n","import seaborn as sns\n","# 通过 facegrid 方法根据不同变量绘制不同的图形\n","replicated_lik = sns.FacetGrid(replicated_sim,col=\"pi\")\n","replicated_lik.map(sns.histplot,'y',stat='probability',discrete=True)\n","plt.tight_layout()\n","plt.show()"]},{"cell_type":"markdown","metadata":{"_id":"A9FFE801A1A24AEC97683A82E1DED0CF","id":"327AE046C4E94B10A9E1913C37CC845C","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["### 4. 查看$y=1$时，对应的$\\pi$的分布情况"]},{"cell_type":"code","execution_count":28,"metadata":{"_id":"B07F0C7DA92949F8B37A353DD13C21AC","collapsed":false,"id":"5A432422434A4340A638852042F1DB65","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[],"source":["replicated_post = replicated_sim[replicated_sim['y'] == 1].value_counts()\n","replicated_post"]},{"cell_type":"code","execution_count":29,"metadata":{"_id":"E05EE4EBAB694DA3998A35F52F1512D9","collapsed":false,"id":"57BF0CF526A84C67A5A624D17DEAF1D4","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[],"trusted":true},"outputs":[],"source":["replicated_post = replicated_sim[replicated_sim['y'] == 1]\n","\n","replicated_post_plot = sns.histplot(data = replicated_post, x=\"pi\")\n","\n","#plt.xticks(np.arange(0.2,0.8,0.3))\n","\n","replicated_post_plot.set(xticks=[0.2,0.5,0.8])\n","sns.despine()"]},{"cell_type":"markdown","metadata":{"id":"0A83B87D55634BE9AE6B3F2F73593ABA","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["思考：频率学派(经典统计)会如何处理上述两个问题？  \n","* 某项研究的可重复性  \n","* 重复6次的成功率"]},{"cell_type":"markdown","metadata":{"_id":"CA3B0BC65DD34D54AE5114F0CE22A127","id":"E209C1087EC14867ADE7F8E0718A9EE7","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["## Part 4: 频率学派与贝叶斯学派的对比"]},{"cell_type":"markdown","metadata":{"_id":"C7BE74D77FA743B1AA2FAEE498C98B43","id":"5291DFE272814754BC89983946594998","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["最近的一篇综述讨论了贝叶斯方法在临床研究设计和分析中的应用，同时比较了贝叶斯与频率主义方法之间的哲学和方法论差异。  \n","\n","\n","![Image Name](https://cdn.kesci.com/upload/sjzj8nd4te.png?imageView2/0/w/640/h/640)  \n","\n","\n","论文中展示的一个例子是一项用于治疗严重急性呼吸窘迫综合征（ARDS）的体外膜肺氧合法（ECMO）试验，研究体外膜肺氧合法（ECMO）对严重急性呼吸窘迫综合征（ARDS）的效果。该试验的结果引发了频率学派和贝叶斯学派在同一数据下得出不同结论的讨论。  \n","\n","> Goligher, E. C., Heath, A., & Harhay, M. O. (2024). Bayesian statistics for clinical research. The Lancet, 404(10457), 1067-1076."]},{"cell_type":"markdown","metadata":{"id":"638A4E3F4A5648BDA438F47FB0394958","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["- 频率学派：试验原本计划招募331名患者，但因中期分析未能证明ECMO治疗具有显著益处，最终只招募了249名患者。结果显示，干预组的死亡率为35%，对照组为46%，表面上看治疗效果显著。然而，基于频率学派的统计分析，P值为0.09，并未达到通常的显著性水平（$p$<0.05）。  \n","- 因此研究者得出结论：试验未能提供充分证据证明早期ECMO可以显著降低死亡率。"]},{"cell_type":"markdown","metadata":{"id":"F39AA5D146D645D9912EFAFC6A61B313","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["- 贝叶斯学派：通过使用不同的先验分布时，H<sub>1</sub>（ECMO可以有效降低干预组死亡率）成立的后验概率在88%至99%之间。  \n","- 这意味着，贝叶斯方法提供了强有力的证据支持ECMO的效果，甚至有学者建议，ECMO方法应被认为是一种有效的治疗手段。"]},{"cell_type":"markdown","metadata":{"_id":"A842C9E2D7844F2BBDCB08A4F2140E41","id":"262B59BDB60343D08FDE2A8913322145","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["### 频率学派如何看待这个世界？  \n"]},{"cell_type":"markdown","metadata":{"_id":"B6C1C56FE41D487D9365CE0A4F715A18","id":"5CE7E7BD0E2140A8A988F8A0AB224587","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["在对比频率学派与贝叶斯学派的差异之前，让我们首先回顾一下频率学派是如何看待这个世界的。"]},{"cell_type":"markdown","metadata":{"_id":"3C44FAF81D784D8E9118803F62AE7B92","id":"4C2B0089D96C40DA98C4BB2952AFB173","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["值得注意的是：  \n","\n","1. 固定的假设：频率学派认为假设（通常是零假设）是一个固定的命题。例如，在试验中，频率学派的零假设是“ECMO对死亡率没有显著影响”。  \n","\n","2. 数据的随机性：在频率学派的框架下，数据被视为随机变量。通过对这些数据进行分析，频率学派关注在假设为真的前提下，观测到当前数据或更极端数据的概率，即$p$值。  \n","\n","3. 无限重复实验的假设：频率学派的推断依赖于假设实验可以无限重复进行，进而通过计算在这些重复实验中得到观测数据的频率来推断真相。因此，置信区间也是基于多次实验的频率分布。  \n","\n","4. 拒绝或接受零假设：通过计算$p$值，频率学派根据预设的显著性水平（通常为0.05）决定是否拒绝零假设。"]},{"cell_type":"markdown","metadata":{"_id":"3EE62AF4C66848F49FD783BB13F6089C","id":"D8216D2DEBFD4E178881E376C893FCA9","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["最后，频率学派如何**推断**出两个总体之间的差异？   \n","- 频率学派通过零假设的显著性检验(Null hypothesis significant test, NHST)来判断显著性。通过计算置信区间(confidence interval)和$p$值来帮助推断过程。  \n","- 在该临床试验中，通过$p$值（如，0.09）和置信区间来推断两个总体之间的差异。"]},{"cell_type":"markdown","metadata":{"_id":"98FA33B154154FD394615B7C7E3E480E","id":"8F9638552FA84E7183D02A7838846425","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["### 贝叶斯学派如何看待这个世界？"]},{"cell_type":"markdown","metadata":{"_id":"D35A738FCE8E4F969B703E782C526F08","id":"BCFF2B6F689241BD89E8D7884F0494E5","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["与频率学派不同，贝叶斯学派认为，概率是对不确定性的主观度量。  \n","贝叶斯学派的核心思想包括：  \n","1. 先验概率：贝叶斯学派从研究者对某个假设的初始信念（即先验概率）出发。这一信念可以基于以往研究、专家意见或临床经验。  \n","2. 更新信念：当新数据（如试验结果）出现时，贝叶斯定理提供了一个框架，将先验概率与新证据（通过似然函数表示）结合，生成后验概率。后验概率代表更新后的信念，即在观察到新数据后，某假设为真的概率。  \n","3. 后验分布与可信区间(credible intervals)：通过后验概率，贝叶斯方法能够直接评估一个假设为真的可能性。例如，贝叶斯分析可以直接得出H<sub>1</sub>成立的后验概率（如88%）。可信区间的概念也更具直观性，它表示在现有数据和先验信息下，某参数位于该区间内的概率。"]},{"cell_type":"markdown","metadata":{"_id":"B16CE4FED77B4F8B90B7A4477D3FD9D4","id":"517FE6D700F1405390BA517A979B60E6","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["#### Thomas Bayes  \n","![Image Name](https://pic2.zhimg.com/v2-ae48785e2b67af851e236b3d38c78c8d_r.jpg)  \n"]},{"cell_type":"markdown","metadata":{"_id":"E75F03A64A1F4D96A7F1430A22D99B20","id":"C1EE19383C454C5EA5A6EE4F1E5CD637","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["#### Pierre Simon Laplace  \n","\n","![Image Name](https://th.bing.com/th/id/R.c252b05834293b10a3005882940d6622?rik=Kr8G5HIK%2fObbHw&riu=http%3a%2f%2fimages.fineartamerica.com%2fimages-medium-large%2fpierre-simon-marquis-de-laplace-maria-platt-evans.jpg&ehk=uHIIZ0qdCLmD0FXAHR4lUGfySQGNKlhNkJgoWIOMJG4%3d&risl=&pid=ImgRaw&r=0)  \n"]},{"cell_type":"markdown","metadata":{"_id":"5AD514DB85CD477D922B690EBB91FFAC","id":"03E4B0C45366435281444A1556F2CCE4","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":["Comment"]},"source":["### 两个学派的差异对比  \n","\n","| **频率学派**                                           | **贝叶斯学派**                                         |  \n","|-------------------------------------------------------|-------------------------------------------------------|  \n","| **概率定义**：概率是事件在无限重复试验中的频率          | **概率定义**：概率是对假设的信念度量                   |  \n","| **假设**：假设是固定的，数据是随机的                    | **假设**：假设是随机的，数据是固定的                   |  \n","| **推断方式**：基于假设检验，通过$p$值判断是否拒绝零假设    | **推断方式**：通过更新先验与新数据计算后验概率         |  \n","| **置信区间**：在重复试验中，95%的区间包含真实参数         | **可信区间**：给出某参数位于区间内的概率（如95%可信度） |  \n","| **$p$值**：衡量在零假设下，观测数据或更极端数据的概率      | **后验概率**：给出假设为真的更新概率                   |  \n","| **数据独立性**：推断只基于当前试验数据，不考虑先验信息    | **先验信息**：结合历史数据或专家意见，用于更新推断     |  \n","| **实验重复性假设**：推断基于实验的假想重复性              | **逐步积累信息**：通过结合新数据不断更新和完善假设     |  \n","| **适应性**：实验设计固定，不能在中途更新或调整             | **适应性**：可以灵活调整试验设计和决策，如自适应试验   |  \n","\n","来源：  \n","> Goligher, E. C., Heath, A., & Harhay, M. O. (2024). Bayesian statistics for clinical research. The Lancet, 404(10457), 1067-1076."]},{"cell_type":"markdown","metadata":{"_id":"0544A3BE422D4BC1818D91F7704856D9","id":"F742409602BC4CD29269E4E0F36F8CDB","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["### 贝叶斯的主观性  \n","\n","**任何统计分析方法都不可能完全客观，因此主观性是一个相对概念:**  \n","\n","* 贝叶斯学派的主观性通过先验的设定来体现，透明，不易让人产生误解  \n","\n","* 频率学派的主观性暗含在各种**前提预设**中，比如方差分析中的方差齐性和正态性，这种看似‘客观的’预设，一方面难以满足，一方面也是一种主观的设定。  \n","\n","* 更为宏观的来说，样本的抽取，数据清理方式的选择，分析方法的选择，$p$值的设定，这些都存在主观性。因此，频率学派并没有想象的那么‘客观’。  \n","\n","* 主观不一定是坏事：通过量化方法将个体的经验和专家知识整合到数据分析之中。  \n","\n"]},{"cell_type":"markdown","metadata":{"_id":"9FC3B71C668D4AEE92B138FA14DC5138","id":"086B626967494E559BB660F7380ECE10","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["#### 重复抽样的不同作用"]},{"cell_type":"markdown","metadata":{"_id":"0B90EAC736154922A720436EFE54E554","id":"F257DBD4E74949739BB6290ECBDE060C","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["##### 频率学派  \n","* 统计推断依赖于参数的**抽样分布**，即只要无限(long-run)的进行抽样，样本分布的参数就会有某种分布形式；  \n","* 零假设检验（Null Hypothesis Significance Testing，NHST）中的$p$值和置信区间的解读均依赖于“无限次抽样”的预设；  \n","* 实际操作中，我们往往只会收集一次数据，并不会反复的进行抽样；有些情境中，预设“无限次重复抽样并不合理；  \n","\n","##### 贝叶斯学派  \n","* 假定参数本身是分布，不确定性一起存在于推断之中；  \n","* 直接根据数据对先验信念进行更新；  \n","\n","**置信区间(confidence intervals) vs 可信区间(credible intervals)**  \n","\n","**No free lunch: 各有优势和缺陷**"]},{"cell_type":"markdown","metadata":{"_id":"C8726BADC78B4D119E9131D177873C4D","id":"5C1D8EFCDC214282AFB2A2DACFD4087B","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["#### 不同的先验和似然会产生不同的后验分布  \n","\n","![Image Name](https://cdn.kesci.com/upload/image/rhqcb9gji7.png?imageView2/0/w/500/h/500)  \n"]},{"cell_type":"markdown","metadata":{"_id":"FC6BA06B185F4E489BC5CBD9BEF37A95","id":"B5B2001762FC4D7D960A40369EC1B3CB","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["#### NHST的\"弱项\""]},{"cell_type":"markdown","metadata":{"_id":"6F25768311A548E1A9B25A0A401E1C97","id":"405698A4532D45BC96021B4D0DC4472B","jupyter":{},"notebookId":"66ea2e9f925f3bb1a291ea6b","runtime":{"execution_status":null,"is_visible":false,"status":"default"},"scrolled":false,"slideshow":{"slide_type":"slide"},"tags":[]},"source":["* 无法直接对零假设(null hypothesis)进行支持，即如果两个总体没有显著差异，他们的相似程度有多少？  (许岳培等, 2023, *应用心理学(04)*, 369-384)  \n","\n","* 一次性只能对比两个总体的假设进行比较；  \n","\n","* 控制假阳性是一个棘手的问题"]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.5.2"}},"nbformat":4,"nbformat_minor":5}
